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Abstract 

As  analysis  of  imagery  and  environmenfal  dafa  plays  a  greafer  role  in  mission  con- 
sfrucfion  and  execution,  there  is  an  increasing  need  for  aufonomous  marine  vehi¬ 
cles  fo  fransmif  fhis  dafa  fo  fhe  surface.  Wifhouf  access  fo  fhe  dafa  acquired  by  a 
vehicle,  surface  operators  caimof  fully  undersfand  fhe  sfafe  of  the  mission.  Com¬ 
municating  imagery  and  high-resolution  sensor  readings  to  surface  observers  re¬ 
mains  a  significanf  challenge  -  as  a  resulf,  currenf  felemefry  from  free-roaming 
aufonomous  marine  vehicles  remains  limifed  fo  'hearfbeaf'  sfafus  messages,  wifh 
minimal  scientific  dafa  available  until  after  recovery.  Increasing  fhe  challenge,  long- 
disfance  communication  may  require  relaying  data  across  multiple  acoustic  hops 
between  vehicles,  yet  fixed  infrasfrucfure  is  nof  always  appropriafe  or  possible. 

In  this  thesis  I  present  an  analysis  of  fhe  unique  considerations  facing  felemefry 
systems  for  free-roaming  Aufonomous  Underwafer  Vehicles  (AUVs)  used  in  explo- 
rafion.  These  considerations  include  high-cosf  vehicle  nodes  with  persistent  stor¬ 
age  and  significant  computation  capabilities,  combined  with  human  surface  opera¬ 
tors  monitoring  each  node.  I  then  propose  mechanisms  for  inferacfive,  progressive 
communication  of  dafa  across  multiple  acoustic  hops.  These  mechanisms  include 
wavelef-based  embedded  coding  mefhods,  and  a  novel  image  compression  scheme 
based  on  fexfure  classificafion  and  synthesis.  The  specific  characferisfics  of  under¬ 
water  communication  chaimels,  including  high  latency,  intermittent  communica¬ 
tion,  the  lack  of  insfanfaneous  end-fo-end  connecfivify,  and  a  broadcasf  medium, 
inform  fhese  proposals.  Human  feedback  is  incorporated  by  allowing  operafors  fo 
idenfify  segmenfs  of  dafa  fhaf  warranf  higher  qualify  refinemenf,  ensuring  efficienf 
use  of  limifed  fhroughpuf.  I  fhen  analyze  fhe  performance  of  fhese  mechanisms 
relafive  fo  currenf  practices. 

Einally,  I  presenf  CAPTURE,  a  felemefry  archifecfure  fhaf  builds  on  fhis  analy¬ 
sis.  CAPTURE  draws  on  advances  in  compression  and  delay  foleranf  nefworking  fo 
enable  progressive  fransmission  of  scienfific  dafa,  including  imagery,  across  mul- 
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tiple  acoustic  hops.  In  concert  with  a  physical  layer,  CAPTURE  provides  an  end- 
to-end  networking  solution  for  communicating  science  data  from  aufonomous  ma¬ 
rine  vehicles.  Aufomafically  selecfed  imagery,  sonar,  and  fime-series  sensor  dafa 
are  progressively  fransmiffed  across  mulfiple  hops  fo  surface  operafors.  Human 
operafors  can  requesf  arbifrarily  high-qualify  refinemenf  of  any  resource,  up  fo  an 
error-free  reconsfrucfion.  The  componenfs  of  fhis  sysfem  are  fhen  demonsfrafed 
through  three  field  frials  in  diverse  environmenfs  on  SeaBED,  OceanServer  and 
Bluefin  AUVs,  each  in  differenf  soffware  archifecfures. 

Thesis  Supervisor:  Dr.  Hanumanf  Singh 
Tifle:  Associafe  Scienfisf,  WHOI 
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CHAPTER  1 


Introduction 


This  thesis  presents  a  novel  method  for  communicating  scientific  telemetry  from 
underwater  vehicles.  The  contributions  of  this  thesis  include:  a  multi-hop  relay 
protocol  incorporating  advances  from  the  field  of  Delay  Tolerant  Networking  (DTN) 
and  designed  for  Autonomous  Underwater  Vehicles  (AUVs),  a  method  for  incor¬ 
porating  human  feedback  into  the  selection  of  science  telemetry,  identification  of 
and  extensions  to  compression  techniques  well  suited  to  underwater  data,  a  novel 
compression  scheme  based  on  texture  classification  and  synthesis,  an  architecture 
for  AUV  telemetry  that  integrates  these  advances,  and  the  demonstration  of  the  ar¬ 
chitecture's  viability  through  a  prototype  system  and  multiple  field  tests  on  diverse 
vehicles. 

This  telemetry  architecture,  nicknamed  CAPTURE,  has  been  designed  to  en¬ 
able  progressive  communication  of  rich  scientific  data  from  underwater  vehicles  to 
human  operators  on  the  ocean  surface,  across  a  sequence  of  free-swimming  relay 
vehicles.  Progressive  transmission  provides  operators  with  a  gradually  improv¬ 
ing  approximation  to  environmental  data,  sonar  imagery,  or  photographs,  over  the 
course  of  a  normal  mission.  Operator  feedback  can  be  used  to  obtain  arbitrarily 
high-quality  refinement  of  specific  sections  of  interesting  data,  up  to  an  error-free 
reconstruction.  The  use  of  multiple  relay  vehicles  allow  efficient  long-distance  com¬ 
munication,  even  with  contemporary  fixed-power  acoustic  modems. 
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1.1  Autonomous  underwater  exploration 


For  those  who  study  the  ocean,  AUVs  provide  unique  capabilities  to  explore  what 
can  be  an  extremely  forbidding  environment.  Free  of  a  physical  surface  tether, 
AUVs  are  able  to  perform  surveys  without  human  intervention,  and  many  kilome¬ 
ters  from  a  surface  vessel.  The  Sentry[132]  AUV,  developed  at  Woods  Hole  Oceano¬ 
graphic  Institution  (WHOI),  is  capable  of  traveling  nearly  one  hundred  kilometers 
on  a  single  charge,  and  the  Tethys  vehicle  developed  at  Monterey  Bay  Aquarium 
Research  Institute  (MBARI)  is  expected  to  have  a  range  of  nearly  two  thousand 
kilometers  through  extensive  optimization  of  the  sensor  suite  and  hydrodynamics. 
AUVs  enable  scientists  from  across  the  oceanographic  disciplines  to  answer  ques¬ 
tions  about  the  health  of  our  nation's  fisheries[121]  or  learn  the  secrets  of  ancient 
ship  and  airplane  wrecks  [35].  AUVs  have  operated  in  environments  as  diverse  as 
lively  Puerto  Rican  coral  reefs[4],  the  world's  longest  aqueduct[112],  hydrothermal 
vents  along  the  mid-oceanic  ridges[133],  and  the  Arctic  seafloor[107]. 

This  independence  from  a  surface  ship  is  a  significant  asset  in  polar  environ¬ 
ments,  where  surface  movement  of  any  sort  is  challenging  and  slow  work.  AUVs 
have  proven  to  be  particularly  effective  tools  for  under-ice  research  since  they  can 
range  freely  for  great  distances  under  the  ice.  Recently  intensified  interest  in  the  po¬ 
lar  regions  has  driven  a  number  of  AUV  missions  in  both  the  Arctic  and  Antarctic[75, 
59,  60].  Fig.  1-1  shows  the  SeaBED  AUV  preparing  to  survey  the  underside  of  an 
Antarctic  ice-flow  in  2010.  While  the  earliest  of  these  missions  involved  skirting 
the  edges  of  ice  flows,  autonomous  vehicles  now  venture  farther  and  farther  under 
ice  from  their  launch  point  as  climate  scientists  seek  answers  to  vexing  questions 
about  the  causes  and  progress  of  climate  change[3]. 

The  Nioghalvfjerdsfjorden  Glacier,  shown  in  Fig.  1-2,  poses  just  such  questions. 
The  melting  of  Greenland  ice  sheets,  driven  by  climate  change,  currently  accounts 
for  nearly  a  one  millimeter  rise  in  sea  level  each  year [68].  A  significant  driver  of 
this  melting  is  believed  to  be  warm  subtropical  waters  at  the  ocean  /  ice  /  land 
triple-point[113],  yet  efforts  to  characterize  these  processes  by  any  means  are  ham- 


16 


Figure  1-1:  The  SeaBED  AUV  preparing  for  an  under-ice  mission  in  Antarctica  dur¬ 
ing  the  2010  IceBELL  expedition. 

pered  by  an  ice  thickness  of  100m  or  more,  as  shown  in  Fig.  l-2a.  The  nearest 
feasible  access  point  for  a  vehicle  is  a  small  rift,  shown  in  Fig.  l-2b,  tens  of  kilome¬ 
ters  away.  An  AUV  could  be  inserted  there  and  travel  horizontally  to  the  area  of 
interest.  Should  that  AUV  become  entrapped,  or  suffer  a  mechanical  failure  under 
the  one  hundred  meter  thick  ice  tongue,  the  environmental  data  it  collected  would 
likely  be  irrecoverable.  This  is  in  stark  contrast  with  other  field  exploration  robots, 
such  as  the  Mars  rovers[15,  2],  which  have  returned  incredibly  valuable  scientific 
data  despite  every  vehicle  remaining  behind  on  the  Martian  surface. 

1.2  Operator  involvement 

AUVs,  by  their  very  nature,  do  not  require  active  human  intervention  to  complete  a 
mission.  As  the  mission  progresses,  AUVs  typically  transmit  small  'heartbeat  mes¬ 
sages'  containing  the  current  status  of  the  mission  and  health  of  the  vehicle.  These 
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(a)  Cross-section,  Nioghalvfjerdsfjorden  Glacier  (b)  Rift,  Nioghalvfjerdsfjorden  Glacier 


Figure  1-2:  (Left)  Cross  section  of  the  glacier  showing  the  ocean-ice-land  triple 
point,  circled  in  red,  and  the  rift  where  AUVs  could  be  inserted.  (Right)  The  rift 
in  the  glacier  up  close.  The  rift  is  a  crack  in  the  roughly  100m  thick  floating  ice 
tongue  (walls  visible  on  either  side),  which  is  covered  in  a  10  inch  layer  of  sea-ice. 

Photo  courtesy  Eric  Philips,  IceTrek. 

messages  are  closely  watched  in  realtime  by  human  operators  to  ensure  the  safety 
of  the  vehicle,  but  they  have  historically  been  of  little  scientific  value.  Communica¬ 
tion  constraints  ensure  that  the  vast  majority  of  science  data  is  not  available  until 
the  vehicle  has  surfaced  at  the  completion  of  the  mission.  If  AUVs  were  able  to 
communicate  acquired  science  data  to  surface  operators,  in  a  marmer  such  as  this 
thesis  enables,  operators  would  be  able  to  review  some  of  that  data  before  mission 
completion  and  adjust  the  goals  of  the  vehicle  while  it  is  still  deployed  and  near  the 
regions  of  interest.  While  the  vehicle  is  operating  normally,  these  ship-board  oper¬ 
ators  currently  represent  an  underutilized  resource,  waiting  hours  for  the  vehicle 
to  return. 

The  opportunity  presented  by  involving  surface  operators  more  closely  into 
AUV  missions  was  recently  demonstrated  during  an  environmental  survey  of  the 
2010  Deepwater  Horizon  oil  spill  by  the  Sentry  AUV.  During  that  mission,  individ¬ 
ual  readings  from  an  onboard  mass  spectrometer  were  transmitted  to  the  surface 
ship[53].  Previous  work  by  this  author[74]  was  used  for  the  rendering  and  dis¬ 
play  of  the  data  as  it  was  received.  Even  this  limited,  highly  subsampled  view  of 
the  data  led  to  site  selection  and  survey  design,  the  selection  of  locations  for  further 
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Figure  1-3:  Screenshot  of  data  communicated  to  the  surface  during  an  autonomous 
survey  of  the  environmental  impacts  of  the  Deepwater  Horizon  disaster,  integrated 
with  other  sources  and  overlayed  on  satellite  photographs  in  Google  Earth.  White 
lines  indicate  data  from  the  shipboard  acoustic  doppler  current  profiler.  Vertical 
dark  blue  lines  indicate  data  collected  over  the  preceding  days  with  CTD  casts,  and 
the  zigzag  located  at  center  indicates  mass  spectrometer  data  telemetered  acousti¬ 
cally  from  the  AUV  in  real  time. 


study  with  other  instruments,  real-time  survey  modification,  and  provided  the  first 
visual  confirmation  of  a  coherent  subsea  oil  plume,  all  while  the  vehicle  was  under¬ 
water.  This  capability  remains  rare,  and  this  thesis  extends  it  to  imagery  and  high 
quality  scalar  data,  while  enabling  multihop  communication  for  more  challenging 
environments. 


1.3  Motivation  for  this  work 

Telemetry  from  AUVs  historically  has  been  limited  to  a  small  and  predefined  set 
of  vehicle  state  information,  such  as  the  position,  interspersed  with  the  occasional 
scalar  measurement  from  one  or  two  simple  sensors.  This  level  of  communication 
has  proven  adequate  (if  unsatisfying)  when  fhe  only  decision  facing  an  operafor  is 
whether  or  not  to  abort  the  mission  of  a  single  vehicle.  Missions,  however,  may  now 
involve  multiple  vehicles  working  towards  a  set  of  goals,  in  dangerous  and  uncon- 
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strained  environments  such  as  under-ice,  operating  great  distances  from  surface 
ships.  There  is  an  increasing  need  for  human  operators  to  have  access  to  the  data 
gathered  by  an  AUV  prior  to  any  plaimed  recovery  Specific  benefits  include: 

New  opportunities  High-risk  exploration  missions,  like  the  proposed  Nioghalvf- 
jerdsfjorden  glacier  mission,  are  currently  impractical  given  the  value  of  an 
AUV  and  the  likelihood  of  learning  nothing  from  the  mission.  Even  for  more 
traditional  missions,  AUV  recoveries  from  the  open  ocean  are  challenging  and 
risky  for  both  vehicles  and  operators.  Returning  scientific  data  prior  to  the 
end  of  a  mission  would  lower  the  risk  of  failure,  and  thereby  enable  missions 
that  are  currently  impractical  or  impossible. 

Financial  incentive  Vehicles  may  take  hours  to  ascend  from  missions  in  the  deep 
sea.  The  deployment  and  recovery  process  for  an  AUV  may  take  an  hour 
each.  Since  a  single  day  of  sea  time  on  a  fully  staffed  oceanographic  ship  can 
cost  $25  000,  and  an  icebreaker  upwards  of  $100  000  per  day,  maximizing  the 
scientific  return  on  each  mission  is  critical.  Observing  a  subset  of  the  vehicle's 
data  prior  to  recovery  could  suggest  small  adjustments  to  the  current  mission 
with  potentially  large  payoffs,  or  allow  plarming  to  begin  for  future  missions. 

Improved  autonomy  As  advanced  mission  executives,  such  as  MOOS-IvP[9],  T- 
REX[87],  and  DAMN[92, 93],  enable  complex  subsea  analysis  of  gathered  data, 
communicating  that  data  to  the  surface  becomes  (perhaps  counter-intuitively) 
of  greater  importance.  One  two-year  study  of  interactions  between  human 
operators  and  Zoe,  a  field  robot  deployed  in  the  Atacama  desert,  found  that 
as  the  level  of  vehicle  autonomy  increased  over  the  years,  users  needed  signif¬ 
icantly  more  transparency  into  the  robot's  decision-making  processes  -  oper¬ 
ator  questions  changed  from  "what  happened"  to  "why  is  it  doing  this"  [114]. 

Components  of  this  work  offer  significant  benefit  during  more  traditional  AUV 
missions  as  well.  The  photomosaic  of  the  World  War  II  torpedo  bomber  shown  in 
Eig.  1-4,  generated  by  the  author,  consists  of  images  captured  during  an  AUV  dive 
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Figure  1-4:  Photomosaic  of  the  submerged  wreck  of  an  Avenger  forpedo  bomber, 
lost  in  the  Chaimel  Islands  National  Marine  Sanctuary  and  Park.  Raw  images  cour¬ 
tesy  NOAA  Northwest  Fisheries  Science  Center. 

in  the  Chaimel  Islands  National  Marine  Sanctuary.  This  dive  was  the  third  attempt 
to  capture  imagery  of  fhat  sife  -  the  first  dive  was  ruined  by  a  faulty  strobe,  and  the 
second  by  a  disconnected  cable.  These  types  of  errors,  while  avoidable,  are  difficulf 
to  completely  prevent  due  to  the  complexity  of  AUVs.  Had  previews  of  fhe  imagery 
been  available  during  the  mission,  it  could  have  been  cancelled  early  rather  than 
wasting  expensive  ship  time.  Relaying  this  data  to  the  surface  requires  advances  in 
the  current  state  of  both  compression  for  AUV  felemefry,  and  communicafion. 

1.4  Wireless  communication  underwater 

While  fypical  land  or  air-based  robofs  mighf  communicafe  dafa  fo  human  oper- 
afors  using  high-frequency  elecfromagnetic  signaling,  such  as  radio  modems  or 
802.11  "WiFi",  elecfromagnefic  radiafion  is  quickly  dispersed  by  wafer.  Table  1.1 
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lists  the  current  viable  methods  for  underwater  communication.  Unfortunately, 
the  freedom  of  AUVs  to  operate  without  a  physical  tether  comes  at  a  cost.  Whereas 
tethered  vehicles  deliver  environmental  data  and  imagery  to  surface  operators  in 
real  time,  like  the  JASON  II[31]  or  Nereus[12]  Remotely  Operated  Vehicles  (ROVs), 
AUV  sensor  data  is  typically  inaccessible  until  after  the  vehicle  has  been  recovered. 


Throughput 

(kbps) 

Long 

Range 

Free 

Motion 

Acoustic  Modem 

0.01-0.5 

6km  / 

/ 

Acoustic  Tether 

1-15 

6km  / 

X 

Optical  Modem 

5  000 

100m  X 

/ 

Physical  Tether 

1  000  000 

12km  / 

X 

Table  1.1:  Viable  communication  options  for  underwater  vehicles. 


(a)  Acoustic  Modem  (b)  Acoustic  Tether 


Figure  1-5:  (Left)  A  vehicle  equipped  with  an  acoustic  modem.  Note  the  omni¬ 
directional  beam  pattern,  allowing  communication  over  great  horizontal  distances. 
(Right)  A  vehicle  equipped  with  an  acoustic  tether.  Note  the  narrow  beam  pattern, 
yielding  high  bandwidth  but  requiring  vertical  communication. 


While  there  have  been  advances  in  high-bandwidth  short-range  (<  100m)  opti¬ 
cal  modems,  acoustic  communication  remains  the  only  viable  option  for  wireless 
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underwater  communication  over  multiple  kilometers.  In  addition  to  wireless  com¬ 
munication  methods,  the  recently  developed  Nereus[12]  vehicle  at  WHOI  spools 
out  tens  of  kilometers  of  fine  fiber-optic  cable,  capable  of  supporting  communica¬ 
tion  withouf  the  weight  of  a  traditional  tether.  While  an  exciting  development,  this 
solution  still  limits  the  vehicle  to  operations  in  the  proximity  of  a  ship,  and  is  im- 
pracfical  for  multiple  vehicles.  In  addition,  fiber  optic  fethers  are  currently  only 
usable  a  single  time,  and  are  then  discarded  into  the  ocean  as  trash.  Should  the 
fiber  fether  be  severed,  the  vehicle  must  fall  back  on  purely  acoustic  communica¬ 
tions  methods. 

Acoustic  tethering,  a  specific  method  of  communicating  with  an  acoustic  mo¬ 
dem,  relies  on  transceivers  with  narrow  beams  and  high  bandwidth,  where  the 
surface  ship  is  locafed  direcfly  above  the  vehicle  as  shown  in  Fig.  1-5.  This  vertical 
relationship  and  focused  fransducer  beam  significanfly  limits  the  effects  of  multi- 
path  interference,  drastically  improving  the  quality  of  the  communication  link.  It 
also  imposes  specific  consfraints  on  the  geometry  of  the  undersea  vehicle  and  the 
surface  ship.  This  is  impracfical  for  many  aufonomous  operations,  nearly  impossi¬ 
ble  for  those  in  polar  environments,  and  does  not  scale  easily  to  multiple  vehicles. 
This  thesis  instead  assumes  the  use  of  a  relay  chain  of  AUVs  for  long  range  hori¬ 
zontal  communication. 


1.5  Telemetry  Encoding 

Long-range  horizontal  acoustic  communications  underwater  is  largely  limited  to 
rates  up  to  hundreds  of  kilobifs  per  second[108],  buf  low  throughpuf  is  far  from 
the  only  obstacle  in  underwater  communications.  The  speed  of  sound  in  seawa¬ 
ter  is  approximately  1.5  kilometers  per  second,  which  leads  to  packet  latencies  of 
several  seconds.  Additionally,  while  traditional  10/100  Ethernet  breaks  data  into 
packets  of  up  fo  1500  byfes,  acoustic  modems  may  require  fragmenting  data  into 
fixed-length  packets  as  small  as  32  bytes  to  facilitate  encoding.  Transmitting  large 
chunks  of  dafa,  such  as  imagery,  requires  heavily  fragmenting  it  and  reunifying 
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these  fragments  on  the  surface.  Since  any  single  fragment  may  be  lost,  the  sys¬ 
tem  must  be  resilient  to  lost  fragments  or  provide  a  way  to  request  retransmission. 
Given  the  high  latencies  involved,  requesting  retransmissions  after  every  packet 
results  in  highly  inefficient  communication  between  the  recipient  and  transmitter, 
where  the  majority  of  both  nodes'  time  is  spent  waiting  for  packefs  fo  arrive. 

Common  terresfrial  protocols  are  poorly  suited  for  underwafer  use  withouf 
adapfation  on  a  number  of  fronts.  The  required  header  for  a  single  User  Datagram 
Protocol  (UDP)  packet  (the  more  minimalist  of  the  two  transport  protocols  used 
by  the  majority  of  internet  traffic)  would  alone  consume  fhree-quarfers  of  the  stan¬ 
dard  256-bit  frame  used  by  default  on  the  WHOI  Micro-Modem.  Data  encoding 
is  typically  performed  using  the  Compact  Control  Language  (CCL),  which  defines 
a  sef  of  shorf  messages  fypically  needed  by  AUV.  While  there  currently  exist  no 
transport  or  application  layer  protocols  in  widespread  use  for  underwater  vehicles, 
there  has  been  significant  research  on  higher  networking  layers  [19].  Numerous  Me¬ 
dia  Access  Control  (MAC)  protocols,  including  MACA[57],  MACAW[11]  FAMA 
derivatives [37, 71]  Aloha  derivatives[101],  and  others  [116]  have  been  developed  to 
mediate  between  multiple  communicating  nodes.  Some  protocols  provide  for  sin¬ 
gle  fransmissions,  whereas  others  allow  long  periods  of  time  fo  be  reserved  by  a  ve¬ 
hicle  for  dafa  fransmission,  amortizing  the  cost  of  a  fraditional  CTS/RTS  exchange 
over  a  longer  transmission[17,  84].  Many  recent  protocols  incorporate  knowledge 
of  vehicle  location  into  the  MAC  process,  to  allow  estimation  of  lafencies[79]  and 
tuning  of  fransmission  power[136].  Recent  research  has  shown  that  modulating  the 
power  of  the  transmitter  based  on  the  required  transmission  distance  is  an  efficient 
way  to  minimize  the  energy  spent  successfully  transmitting  each  bit  across  simu¬ 
lated  networks.  In  [77],  a  decentralized  neighbor  discovery  protocol  is  presented 
that  builds  on  this  cross-layer  design  by  reaching  out  to  nearby  neighbors  before 
expending  power  fo  communicate  more  broadly.  In  practice,  most  AUVs  rely  on 
some  form  of  time-based  multiplexing  to  allow  for  acoustic  navigation  and  sensing. 
Few  field  experiments  have  involved  multiple  AUVs  acting  as  relays. 

While  there  are  no  known  examples  of  underwafer  sysfems  for  relaying  high- 
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bandwidth  imagery  data  across  multiple  AUV  hops,  there  is  extensive  work  in  re¬ 
lated  areas.  A  survey  of  the  relevant  point  to  point  telemetry  research  is  in  Table  1.2. 
The  papers  are  grouped  into  four  categories.  Papers  in  this  first  group  are  the  most 
widely  used  in  the  field,  and  consist  of  simple  one-packet  messageing  systemes. 
The  second  group  of  papers  are  those  that  describe  a  source  coding  (compression) 
method,  optimized  in  some  way  for  underwater  imagery,  without  describing  any 
particular  communication  technique.  Each  of  the  methods  in  this  second  group 
could  be  used  for  compression  in  conjunction  with  CAPTURE,  but  is  not  a  teleme¬ 
try  solution  in  and  of  itself.  The  third  group  of  papers  are  those  that  combine  source 
and  charmel  coding.  These  approaches  to  transmission  apply  EEC  in  such  a  way 
that  errors  result  in  the  image  degrading  gracefully,  rather  than  preventing  decod¬ 
ing  of  the  packet.  These  methods  are  the  ones  used  with  special  purpose  modems, 
such  as  acoustic  tether  systems.  They  are  incompatible  with  the  general-purpose 
acoustic  modems  used  by  most  AUVs.  The  final  group,  error-tolerant  approaches 
to  source  coding,  break  data  across  multiple  frames  to  to  increase  robustness  to  lost 
packets.  The  data  is  restructured  such  that  more  important  segments  of  data  have 
higher  amounts  of  protection  than  less  important  segments. 

A  survey  of  relevant  multihop  research  follows  in  Table  1.3.  These  multi-hop 
papers  are  grouped  into  three  categories.  The  first  category  is  those  systems  de¬ 
signed  for  data  collection  from  static  nodes.  While  these  approaches  in  some  of  the 
same  concerns  related  to  MAC  as  more  complicated  multi-hop  networks,  the  ve¬ 
hicle  is  communicating  with  each  node  independently  in  a  point-to-point  marmer 
with  no  routing  across  multiple  hops.  The  second  class  of  papers  are  those  that  are 
designed  for  communicating  data  from  AUVs,  and  the  third  class  are  those  systems 
that  have  been  implemented.  I  have  included  in  this  final  class  two  significant  field 
experiments  in  Delay-Tolerant  Networking,  one  using  buses  as  the  mobile  nodes, 
and  the  other  using  zebras. 

The  routing  method  for  each  of  the  multi-hop  papers  is  determined  to  be  ei¬ 
ther  forwarding-based,  or  replication-based.  Eorwarding-based  methods  rely  on 
strict  transfer  of  data  from  one  node  to  another  node,  along  a  single  route  towards 
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the  receiver.  These  approaches  employ  various  strategies  to  ensure  the  transfer 
from  node  to  node,  but  are  fragile  to  the  loss  (permanent  or  temporary)  of  a  single 
node.  Replication-based  methods  rely  on  broadcasting  data  to  multiple  receivers, 
employing  multiple  possible  paths  to  to  receiver. 
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Physical  Layer 

Progressive  Source  Coding 

Payload  Type 

Routing 

This  Work 

Conventional  Modem 

Yes  -  SPIHT  or  Other  Progressive 

Imagery 

Yes 

Scalar  Telemetry  Systems  —  No  support  for  data  fragmentation 

Schneider  and  Schmidt[98] 

Conventional  Modem 

No  -  (D)CCL 

Single-Packet 

No 

Jakuba[52] 

Conventional  Modem 

No  -  (D)CCL 

Single-Packet 

No 

Webster  et  al.[127] 

Conventional  Modem 

No  -  (D)CCL 

Single-Packet 

No 

Marques  et  al.[65],  Martins  et  al.[67] 

Conventional  Modem 

No  -  Custom 

Single-Packet 

No 

Raja  la  et  al  .[86] 

Conventional  Modem 

No  -  Custom 

Single-Packet 

No 

Smith  et  al.[106] 

Conventional  Modem 

No  -  Custom 

Single-Packet 

No 

Suggested  Compression  Techniques 

—  Unimplemented,  but  designed  for  underwater  telemetry 

Li  et  al.[61] 

Simulation 

Yes  -  WDR-like 

Imagery 

No 

Walker  et  al.[126] 

Simulation 

Yes  -  WDR-like 

Imagery 

No 

Hoag  and  lngle[46],  Hoag  et  al.[47] 

Simulation 

No  -  Wavelet  VQ 

Imagery 

No 

Joint  Source  /  Channel  Coding  —  Single-purpose  point-to-point  links,  may  impose  specific  node  geometries 

Beaujean  and  Carlson[8] 

Short-Range  Acoustic 
Tether 

No  -  BCH 

Sonar 

No 

Kristensen  and  Vestgard[58] 

2kbps  Acoustic  Tether 

No  -  Raw 

Imagery 

No 

Suzuki  et  al.[115] 

16kbps  Acoustic  Tether 

No  -  DCT  (256pxx240px) 

Imagery 

No 

Vail  et  al.[123] 

OFDM  In- Air  Acoustic 

No  -  MPEG4  /  ERT 

Video 

No 

Iglesias  et  al.[51] 

Simulation 

No  -  DT  Analog  Compressed  Sensing 

Imagery 

No 

Zhao  and  Cheng[134] 

Simulation 

Yes  -  SS-SPIHT 

Imagery 

No 

Error  Tolerant  Source  Coding  —  Source  coding  augmented  with  unequal  FEC  to  guard  against  packet  loss 

Collins  et  al.[23],  Collins  and 
At  kins  [22] 

Simulation 

Yes  -  SPIHT  with  EREC 

Imagery 

No 

Mohr  et  al.[69],  Mohr  et  al.[70] 

Simulation 

Yes  -  SPIHT  with  ULP 

Imagery 

No 

Table  1.2:  A  Survey  of  Point  to  Point  AUV  Telemetry  Systems.  While  this  thesis  does  support  multihop  communication,  it  is  included 
at  the  top  of  the  table  in  gray  for  comparison.  Gray  cells  in  the  table  represent  shared  characteristics  with  this  thesis. 


K) 

00 


Physical  Layer 

Source  Coding 

Payload  Type 

Routing 

Data  Collection  —  Mobile  vehicles  roaming  between  static  nodes,  collecting  data 

Hollinger  et  al.[49],Hollinger  et  al.[48] 

Simulation 

N/A 

Single-Packet 

None 

Dunbabin  et  al.[27] 

Short-range  optical 

Custom 

Single-Packet 

None 

Misc  Protocols  —  Selected  protocols  for  communicating  from  AUVs 

Zorzi  et  al.[137] 

Simulation 

N/A 

N/A 

Forwarding 

Jones  et  al.[54] 

Simulation 

N/A 

N/A 

Replication 

Nimbalkar  and  Pompili[76] 

Simulation 

N/A 

N/A 

Replication 

Talavage  et  al.[117] 

Simulation 

N/A 

N/A 

Forwarding 

Toni  et  al.[122] 

Simulation 

Progressive  +  UEP 

Imagery 

Forwarding 

Implemented  —  Systems  which  have  been  used  in  the  field 


This  Work 

Conventional  Modem 

Progressive 

Imagery 

Replication 

Xie  and  Ciibson[130j,  Rice  et  al.[91j, 
Rice  and  Green[88],  Rice  and  Ong[89] 

Conventional  Modem 

Varied 

Varied 

Forwarding 

Goel  et  al.[43],  Flaag  et  al.[45],  Benton 
et  al.[10],  Duarte  et  al.[26] 

Conventional  Modem 

Unknown 

Single-Packet 

Replication 

Balasubramanian  et  al.[6] 

RF  on  40  Buses 

None 

Random  Data 

Replication 

Juang  et  al.[55] 

RF  on  Wild  Zebras 

Unknown 

Timeseries  of  Po¬ 
sitions 

Replication 

Table  1.3:  A  Survey  of  Multihop  AUV  Telemetry  Systems.  This  thesis  is  included  in  gray  for  comparison.  Gray  cells  in  the  table 
represent  shared  characteristics  with  this  thesis. 


I  highlight  four  different  aspects  of  the  papers.  First,  I  identify  the  physical  layer 
described  in  the  paper.  Most  experiments  have  been  performed  as  software  sim¬ 
ulations,  but  several  have  been  implemented  for  use  with  either  high-bandwidth 
acoustic  tether  systems,  or  on  conventional,  broadcast  acoustic  modems.  Of  the 
cited  papers  that  are  designed  for  conventional  acoustic  modems,  only  one  set  has 
exhibited  the  capability  to  communicate  imagery  or  other  data  more  complex  than 
basic  vehicle  health.  Those  papers  describe  the  set  of  experiments  performed  by 
Benthos  and  the  Navy  Postgraduate  School  as  part  of  the  U.S.  Navy's  SeaWEB[90] 
program.  In  contrast  with  my  work,  which  relies  entirely  on  networks  of  free- 
swimming  AUVs,  SeaWEB  relies  on  a  dense  cellular  network  of  many  fixed  seafloor 
nodes.  Vehicles  in  the  area  of  the  network  communicated  with  the  nearest  fixed 
node,  which  then  relays  data  back  to  land  via  a  fixed  routing  table.  Data  is  re¬ 
layed  from  fixed  node  to  fixed  node,  with  each  attempting  to  immediately  forward 
acquired  data  in  the  maimer  of  a  traditional  terrestrial  network.  Should  a  node 
become  disabled  after  accepting  a  transfer,  there  are  no  end-to-end  guarantees  or 
ways  of  working  around  the  lost  data. 

When  an  end-to-end  path  is  not  immediately  available,  and  nodes  are  moving 
relative  to  each  other,  replication  of  data  rather  than  handing  it  off  has  several  bene¬ 
fits,  as  described  in  greater  detail  in  Chapter  2.  This  strategy,  known  as  replication 
routing  or  store  and  forward  routing,  is  used  in  this  thesis,  as  well  as  in  a  set  of 
papers  describing  work  on  the  Solar  AUV  at  AUSl.  In  that  work,  a  vehicle  mov¬ 
ing  between  two  portions  of  a  partitioned  network  stored  transmissions  until  they 
could  be  delivered  to  the  second  portion  of  the  network.  These  transmissions  were 
standalone  messages  containing  vehicle  states,  which  could  be  stored  without  any 
need  for  ordering  or  fragmentation. 

1.6  Organization  of  this  thesis 

This  thesis  begins  with  analysis  of  the  need  for,  and  characteristics  of,  delay-tolerant 
underwater  multi-hop  relay  networks  (Chapter  2).  It  then  continues  through  a  dis- 
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cussion  of  data  compression  techniques  for  AUVs,  including  novel  approaches  to 
scalar  telemetry  and  image  compression  (Chapter  3).  These  contributions  com¬ 
prise  key  components  of  CAPTURE,  my  proposed  architecture  for  AUV  telemetry 
compatible  with  multiple  contemporary  AUVs.  Chapter  4  lays  out  the  overall  CAP¬ 
TURE  architecture,  and  describes  the  integration  of  CAPTURE  into  existing  vehicle 
platforms.  Eield  results  from  three  separate  trials  are  presented  in  Chapter  5.  Ei- 
nally.  Chapter  6  wraps  up  by  discussing  limitations,  future  work  and  conclusions. 
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CHAPTER  2 


Multi-Hop  Relay  Communication 


As  described  in  Chapter  1,  underwater  communication  over  long  horizontal  dis¬ 
tance  currently  requires  the  use  of  acoustic  modems.  In  this  chapter,  I  provide 
an  analysis  of  the  benefits  of  using  multiple  AUV  'hops'  to  relay  vehicle  telemetry 
over  long  horizontal  distances.  Specifically,  these  benefits  include  increased  com¬ 
munication  efficiency  and  decreased  power  usage.  I  then  analyze  the  challenges 
presented  by  communicating  across  such  a  sequence  of  relays,  including  high  la¬ 
tencies,  the  lack  of  an  instantaneous  end-to-end  path,  and  mobility  of  nodes.  Fi¬ 
nally,  I  propose  an  approach  to  relay  communication  tuned  to  the  challenges  and 
strengths  of  these  AUV  relay  chains,  including  the  presence  of  storage  onboard  the 
vehicle,  and  the  necessarily  small  number  of  nodes. 

2.1  Small,  Multi-hop  Relay  Links 

The  ocean  imposes  severe  limitations  on  acoustic  communication,  including  low 
available  bandwidth  and  long  propagation  delays[l,  5, 108],  which  lead  to  frequent 
data  corruption  and  high  latencies.  These  communication  challenges  are  made 
worse  by  operating  over  large  distances,  by  heavy  ship  traffic  in  the  area,  by  strong 
winds  and  by  the  presence  of  multi-path  interference.  Despite  these  challenges, 
robust  physical  communication  layers  exist  off-the-shelf  in  the  form  of  acoustic 
modems  from  manufacturers  including  Linkquest,  Sonardyne,  Teledyne  Benthos, 
and  WHOI.  To  correct  for  bit  errors  during  transmission,  while  minimizing  power 
usage,  acoustic  modems  typically  offer  a  discrete  set  of  pre-programmed  Forward 
Error  Correction  (FEC)  levels [85].  Users  typically  provide  a  short  and  fixed-length 
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payload  to  the  modem,  which  applies  FEC  and  includes  a  checksum  for  verifica¬ 
tion  on  the  receiving  end.  This  data,  now  with  added  redundancy,  is  modulated 
and  transmitted  via  a  transducer  into  the  water.  In  the  case  of  general  purpose 
modems,  FEC  is  applied  uniformly  to  the  transmitted  message,  without  regard  for 
the  importance  of  individual  bits. 

For  a  fixed  power  and  transmission  bandwidth,  the  level  of  FEC  applied  prior 
to  transmission  determines  the  balance  between  throughput  and  reliability.  When 
the  data  is  received,  it  is  demodulated  and  equalized  before  the  modem  attempts 
to  decode  the  data.  If  the  data  has  been  heavily  corrupted,  the  errors  will  not  be 
entirely  correctable  and  the  checksum  will  not  match.  In  this  case,  most  acous¬ 
tic  modems  simply  discard  the  received  data.  As  a  result,  commercially  available 
acoustic  modems  present  a  Binary  Erasure  Chaimel  (BEC)  to  users  -  packets  are 
either  successfully  received,  or  lost.  If  the  level  of  FEC  is  insufficient  for  the  current 
channel  then  communication  may  be  extremely  intermittent,  with  long  periods  of 
no  coimectivity.  The  percentage  of  these  transmissions  which  are  unsuccessful  is 
the  Frame  Error  Rate  (FER).  The  WHOI  Micro-Modem[36],  as  one  example,  can 
encode  its  data  using  spreading  or  block  codes  with  varying  levels  of  redundancy. 
As  a  result,  transmitted  packets  may  range  from  32  bytes  to  2048  bytes. 

Figure  2-1  illustrates  the  tradeoff  between  rate  and  reliability  obtained  at  three 
levels  of  FEC  with  data  acquired  during  a  typical  AUV  mission  near  Guam.  The 
mission  was  performed  by  the  AUV  Lucille.  Lucille,  a  SeaBED-class[104]  AUV  oper¬ 
ated  by  the  NOAA  Northwest  Fisheries  Science  Center,  was  equipped  with  a  WHOI 
Micro-Modem[36]  and  a  12.5  kHz  ITC-3013  hemispherical  transducer  for  acoustic 
communications.  Messages  sent  using  the  80bps  encoding  and  Frequency  Hop¬ 
ping  Frequency  Shift  Keying  (FH-FSK)  modulation,  in  red,  were  received  consis¬ 
tently  throughout  the  dive.  Messages  sent  with  the  lowest  level  of  FEC  and  Quadra¬ 
ture  Phase  Shift  Keying  (QPSK)  modulation,  in  blue,  were  received  inconsistently 
but  delivered  a  significantly  higher  instantaneous  throughput.  The  intermediate 
encoding  with  QPSK  modulation,  in  green,  performed  between  the  two  extremes. 
These  statistics  are  typical  of  those  realized  in  practice  -  achieved  throughputs  from 


32 


Figure  2-1:  At  top,  the  percentage  of  messages  received  in  each  eight-minute  time 
period.  At  bottom,  the  instantaneous  throughput  in  bits  per  second  based  on  those 
percentages.  Both  plots  have  been  filtered  with  a  5-potnt  moving  average  to  reduce 
jitter. 


free-swimming  AUV  are  commonly  as  low  as  tens  or  hundreds  of  bits  per  second, 
with  long  periods  of  disruption. 


2.1.1  Frame  Error  Rate  (FER) 

Whether  or  not  a  frame  is  successfully  received  depends  on  the  number  of  bits 
corrupted  during  transmission  being  fewer  than  the  number  that  the  FEC  is  able 
to  correct.  The  number  of  bit  errors  is  governed  directly  by  the  Signal  to  Noise 
Ratio  (SNR)  of  the  signal  at  the  receiver.  The  number  of  frames  that  successfully 
get  through,  then,  is  also  a  function  of  the  SNR.  Fig.  2-2  shows  the  actual  FER  ver¬ 
sus  SNR  during  a  2010  mission  of  the  Lucille  AUV.  In  September  of  2010  Lucille 
assisted  in  mapping  the  submerged  portion  of  the  San  Andreas  Fault  off  North¬ 
ern  California,  at  approximately  39°50'N,  124°W.  During  this  survey,  the  AUV's 
onboard  networking  stack  transmitted  once  every  five  seconds  using  QPSK  and  al¬ 
ternating  levels  of  FEC.  A  particularly  interesting  case  study  of  frame  error  rates  in 
the  vertical  chaimel  is  provided  by  Singh  et  al.[105],  which  analyzes  data  obtained 
during  a  full-ocean  depth  experiment  in  the  Mariana  Trench. 
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Figure  2-2:  Mean  Frame  Error  Rate  versus  measured  SNR  of  detected  packets  dur¬ 
ing  the  Pacific  Storm  2010  field  experiment.  The  solid  line  is  a  Wiener-filtered  'best- 
fit'  line  to  the  points  for  each  level  of  EEC.  For  very  low  SNR  values,  it  is  likely  that 
many  packets  are  simply  not  being  detected.  The  packets  were  transmitted  using  4 
kHz  bandwidth  around  a  center  frequency  of  10  kHz. 


2,1.2  Receiver  SNR  modelling 

For  a  narrowband  signal,  the  SNR  is  the  ratio  of  the  received  signal  strength  to 
the  strength  of  the  ambient  noise,  as  shown  in  (2.1),  where  the  received  signal 
strength  is  the  transmission  power  multiplied  by  some  attenuation  due  to  transmis¬ 
sion  losses.  Here  P  is  the  initial  transmission  power,  A  is  the  attenuation  through 
the  water  column,  N  is  noise  level,  and  d  and  /  are  distance  and  frequency. 


Attenuation 

Following  closely  the  derivation  by  Stojanovic  in  [109],  the  attenuation  of  a  narrow- 
band  acoustic  signal  underwater  comes  from  absorption  by  water  and  spreading 
losses,  as  in  (2.2). 
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(2.2) 


AidJ)^d’^-aif)^ 

The  spreading  losses  are  independent  of  frequency,  and  represented  by  where 
d  is  the  propagation  distance  and  k  describes  the  propagation  geometry  as  spher¬ 
ical  (2.0),  or  'practical'  (1.5).  The  absorption  coefficient,  a(/),  is  dependent  on  the 
frequency.  For  an  unobstrucfed  path,  the  coefficient  can  be  modeled  using  Thorp's 
formula[119, 13]  as  expressed  in  (2.3),  where  frequency  (/)  is  in  kiloHertz  and  a(/) 
is  in  decibels  per  kilometer: 

101oga(/)  =0.11^^  +  44^^  +  2.75  ■  10'^  f  +  0.003  (2.3) 

a(/)  (2.4) 


Across  the  frequencies  lOOHz-lOOkFIz,  which  includes  those  used  in  long-range 


Frequency 


Figure  2-3:  Absorption  Coefficient  [dB/km]  versus  Frequency  [kHz].  The  solid  line 
presents  the  absorption  coefficient  for  a  range  of  frequencies  as  modeled  by  Thorp 
while  the  dotted  line  presents  a  numerically  simpler  approximation,  0.06/^'^^.  Both 
approximations  produce  similar  values  for  the  absorption  coefficient  across  lOOHz 
to  lOOkHz. 

underwater  acoustic  communication,  I  have  found  that  the  absorption  coefficient 
can  be  modeled  by  a(/)  ~  0.06  ■  where  /  is  the  frequency  in  kiloHertz,  as 
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shown  in  Fig.  2-3. 


Noise 


Noise  in  the  ocean  comes  from  four  primary  sources:  turbulence,  shipping,  waves, 
and  thermal  agitation[20].  Each  of  these  can  be  reasonably  modeled  with  the  ap¬ 
proximations  in  Eqs.  2. 5-2. 8,  where  s  represents  the  level  of  shipping  traffic  (O.O-l.O) 
and  w  is  the  wind  speed  in  meters  per  second. 


10  log  Nturbif)  =17  -  30  log  / 

10  log  NsMpif)  =40  +  20(s  -  0.5)  +  26  log  /  -  60  log(/  +  0.03) 
10  log  N^indif)  =50  7.5y/w  +  20  log  /  -  40  log(/  -h  0.4) 

10  log  Nthermif)  =  -  15  20  log  / 


(2.5) 

(2.6) 

(2.7) 

(2.8) 
(2.9) 

(2.10) 


Across  the  frequency  range  used  by  acoustic  communication  systems,  the  primary 
variable  source  of  noise  is  the  surface  motion  of  waves,  driven  by  wind.  Fig.  2-4 
illusfrafes  the  total  value  of  the  noise  for  fhree  different  levels  of  wind  and  shipping 
(Eq.  2.9),  along  with  the  noise  approximation  of  50  —  15  log(/)  (Eq.  2.10)  used  by 
Sfojanovic[109]  and  in  this  thesis. 

AN  Product 

If  we  assume  a  transmitter  with  fixed  power  and  recall  Eq.  2.1,  the  variable  and 
frequency-dependent  component  of  the  receiver  SNR  is  simply  /)Af(/)  • 
this  quantity  versus  frequency  for  several  values  of  distance  (d),  as  in  Fig.  2-5,  clear 
maxima  are  visible.  For  any  given  distance,  there  is  therefore  a  frequency  which 
maximizes  the  SNR,  based  upon  the  attenuation  and  noise.  Using  the  approxima¬ 
tions  from  Eq.  2.10  and  2.4,  we  can  solve  for  the  closed-form  solution  shown  in 
2.16.  This  solution  provides  a  close  approximation  to  the  AN  product  for  frequen- 
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Frequency 


Figure  2-4:  Noise  Level  vs.  Frequency.  Solid  lines  are  used  to  display  the  noise  level 
of  three  different  levels  of  surface  waves  as  presented  in  [20]:  1)  no  wind,  no  ship¬ 
ping;  2)  light  breeze,  minimal  shipping,  and  3)  moderate  breeze,  heavy  shipping. 
The  dotted  line  represents  the  value  of  50  —  15  log(/),  used  as  an  approximation. 


cies  between  lOOFIz  and  lOOkFIz,  where  d  is  the  distance  in  kilometers  and  /  the 
frequency  in  kilohertz. 


0  [-mogA{d,f)  -  101ogiV(/)] 
d 

0  [~(^  ■  lOlogd  -I-  d  ■  101oga(/))  —  (50  —  15 log/)] 

0  [-{k  ■  10  log  d  +  d  ■  -  (50  -  15  log  /)] 

0=^  [-d-0Mf-^^  +  15\ogf] 

35  _  185.19 
“  In  10 
80.425 


(2.11) 

from  (2.5) 

(2.12) 

from  (2.4) 

(2.13) 

(2.14) 

(2.15) 

(2.16) 

The  WHOI  Micro-Modem  operates  around  a  center  frequency  of  10,  15,  or  25 
kHz.  Most  other  commercially  available  modems  operate  at  similar  center  fre- 
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Frequency  [kHz] 

Figure  2-5:  This  figure  depicts  /)Af(/)  several  values  of  d,  the  distance.  This 
represents  the  frequency-dependent  component  of  the  Signal  to  Noise  ratio  (SNR) 
at  the  receiver.  Maxima  are  clearly  visible  for  each  range,  indicating  the  optimal 
frequency  (in  terms  of  noise  and  attenuation)  for  transmission  at  that  range  [109]. 

quencies.  As  shown  in  Fig.  2-6,  these  commercially-available  modems  are  de¬ 
signed  to  perform  optimally  at  distances  between  1  and  5  km.  If  we  invert  the 
plot  shown  in  Fig.  2-5  to  generate  Fig.  2-7,  we  can  observe  the  performance  of 
each  frequency  across  a  range  of  distances.  While  a  25kHz  modem  operates  well 
over  short  (<lkm)  distances,  the  performance  rapidly  falls  off  as  distance  increases. 
While  3kHz  modems  have  been  used  for  long-distance  underwater  communication 
in  the  past,  they  are  attenuated  much  more  at  short  ranges.  lOkHz  modems  there¬ 
fore  represent  a  good  compromise  for  high  performance  over  both  short  and  long 
distances.  For  this  reason,  lOkHz  is  the  frequency  used  by  most  long-range  AUVs 
for  communication. 
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Figure  2-6:  Optimal  frequency  vs.  distance  as  calculated  by  the  approximation  in 
(2.16),  /  =  (^80^25^  0-741  frequencies  for  AUV  communication  are  indicated, 

along  with  the  3kHz  band  that  offers  benefits  for  long-range  single-hop  communi¬ 
cation. 

2.1.3  Motivation  for  a  small  multi-hop  AUV  network 

Every  non-decodable  transmitted  packet  wastes,  minimally,  the  power  required  to 
transmit  it,  the  time  to  transmit  it,  and  the  power  required  to  attempt  decoding  it. 
In  addition,  some  form  of  feedback  from  receiver  fo  fransmitter  may  also  be  re¬ 
quired  to  convey  the  mis-communication.  As  shown  previously,  the  PER  is  closely 
related  to  the  SNR  at  the  receiver,  which  is  governed  by  the  distance  between  nodes, 
the  operating  frequency,  and  the  transmission  power.  In  particular,  to  achieve  some 
FER,  Pje/  there  exists  some  minimum  SNR  so  that  the  achieved  FER  is  less  than  Pje- 
The  particular  relationship  between  SNR  and  FER  depends  on  the  level  and  type 
of  EEC  applied,  and  the  modulation  scheme.  Fig.  2-8  indicates  this  relationship 
for  fhe  five  QPSK-modulated  FEC  levels  supported  by  the  WHOI  Micro-Modem, 
computed  assuming  no  Inter-Symbol  Interference  (ISI)  and  Addifive  Whife  Gaus¬ 
sian  Noise  (AWGN). 

To  ensure  fransmission  across  a  given  disfance,  the  power  of  fhe  transmitfer 
could  simply  be  increased.  This  sfrafegy  does  nof  by  ifself,  however,  provide  a 
workable  solution  fo  communicating  data  from  AUVs  over  a  long  disfance  for  fwo 
reasons.  Firsf,  AUVs  have  a  limited  supply  of  power  available,  relying  on  large 
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Figure  2-7:  Frequency-dependent  component  of  SNR  vs.  Distance  for  three  mo¬ 
dem  frequencies.  SNR  is  on  fhe  vertical  axis,  with  higher  SNR  indicating  a  greater 
likelihood  of  reception.  The  horizontal  axis  displays  transmission  distance.  Out 
to  about  a  kilometer  and  a  half,  a  25kHz  modem  is  the  most  efficient.  Long-range 
performance  is,  however,  quite  poor.  lOkHz  performs  the  best  all  the  way  out  to  ap¬ 
proximately  10km.  3kHz  links  have  significantly  poorer  performance,  nearly  lOdB, 
over  both  short  and  medium  ranges.  A  10  kHz  center  frequency  reflecfs  a  reason¬ 
able  compromise  befween  shorf  and  long-range  performance. 


battery  packs  to  sustain  them  until  recovery.  Second,  many  acoustic  modems  have 
a  fixed  transmission  power,  which  cannot  be  controlled  underwater.  In  [109],  the 
power  required  to  achieve  a  fixed  SNR  is  shown  to  have  an  exponential  relation¬ 
ship  with  distance  (Eq.  2.17).  Similarly,  for  a  given  SNR,  a  chaimel  has  a  certain 
theoretical  maximum  capacity.  The  closed-form  functions  below  are  derived  for 
these  relationships,  where  c,  p,  7,  and  ^jJ,  are  constants  derived  from  modeling  in 
the  same  paper[109]  and  dependent  upon  the  desired  SNR. 


C{d)  =cd-"<  (2.17) 

P{d)  =pd^  (2.18) 


In  [110],  the  case  is  made  that  using  multiple  relay  'hops'  to  communicate  data  al¬ 
lows  for  more  efficient  power  usage.  I  trace  that  argument  here,  and  point  out  that 
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Symbol  SNR  [dB] 

Figure  2-8:  Frame  Error  Rate  (FER)  for  the  WFIOI  MicroModem  versus  receiver 
SNR,  derived  from  simulation.  Additive  White  Gaussian  Noise  (AWGN)  and  the 
absence  of  Inter-Symbol  Interference  (ISI)  are  assumed.  Simulation  results  pro¬ 
vided  by  Sandipa  Singh,  Acoustic  Gommunications  Laboratory,  WHOI. 

in  the  case  of  AUVs  the  number  of  relays  will  necessarily  be  small.  In  particular, 
small  multi-hop  relay  chains  of  AUVs  offer  an  appropriate  solution  for  communi¬ 
cation  over  many  tens  of  kilometers. 

In  order  to  to  transmit  data  as  efficiently  as  possible,  we  seek  to  transmit  mes¬ 
sages  using  the  minimum  amount  of  power,  while  still  ensuring  the  message  is 
successfully  received.  In  other  words,  we  want  to  minimize  the  ratio  of  power  (P) 
to  capacity  (C).  Gonsidering  the  case  for  not  only  a  single  hop  but  for  multiple 
hops,  the  total  power  used  by  the  relay  network  would  be  n  ■  P{d/n),  where  n  is 
the  number  of  hops  and  d  is  the  link  distance.  The  capacity  across  each  hop,  and 
therefore  across  the  entire  sequence  of  hops,  would  he  C{d/n).  A  plot  of  energy 
(E)  per  bit  En{d)  =  can  then  be  derived  for  a  given  SNR.  Fig.  2-9  shows  the 

energy  per  bit  for  a  target  SNR  of  20dB. 

While  transmission  efficiency  is  greater  with  a  larger  number  of  hops,  this  analy¬ 
sis  assumes  that  there  is  no  cost  associated  with  adding  a  single  hop.  By  combining 
the  transmission  cost,  Eb,  with  a  fixed  per-node  cost,  an  expression  for  the  optimum 
number  of  relay  hops  for  a  given  communication  distance  can  be  obtained.  In  [110], 


41 


Figure  2-9:  Energyperbitversusthenumber  of  hops,  from  [109].  The  energy  per  bif 
required  can  be  reduced  by  fransmitting  across  shorter  distances,  or  by  increasing 
the  number  of  hops  present  in  the  network. 

an  empirically  determined  per-hop  cost  of  120dB  is  used.  Using  this  analysis,  the 
optimal  number  of  hops  is  found  to  be  fewer  than  nine  for  ranges  out  to  50  kilome¬ 
ters.  It  is  reasonable  to  expect  that  the  per-hop  cost  is  linear  and  relatively  low  for 
pre-existing,  fixed  seafloor  nodes. 

AUVs  serving  as  hops,  however,  come  with  significant  costs  in  practice.  Each 
additional  hop  requires  the  purchase  and  deployment  of  another  vehicle,  a  more 
complicated  task  for  vehicles  than  for  simple  seafloor  nodes.  Increasing  the  per- 
hop  cost  by  6dB  significantly  decreases  the  determined  number  of  optimal  hops 
over  longer  ranges,  as  in  Fig.  2-lOa.  If  the  vehicles  must  travel  from  the  network 
endpoint  to  their  relay  location,  as  when  deploying  through  ice,  the  additional  en¬ 
ergy  for  this  deployment  process  may  be  significant  and  should  be  included  in  the 
cost  function.  For  the  case  of  a  linear  sequence  of  relays,  that  will  be  the  inter-hop 
distance  multiplied  by  the  number  of  hops  each  vehicle  must  travel,  times 

some  transit  cost.  Fig.  2-lOb  shows  that  incorporating  such  a  cost  also  has  the  effect 
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(a)  Higher  per-node  cost 


Figure  2-10:  The  optimal  number  of  hops  for  communicating  over  five  differenf 
ranges.  The  gray  lines  in  both  figures  represenf  the  optimal  number  as  calculated 
in  [110].  The  top  figure  shows  the  effect  of  increasing  fhe  (empirical)  per-node  fixed 
cosf  by  6dB.  The  botfom  figure  indicafes  the  result  of  infroducing  a  deploymenf  cosf 
based  on  the  distance  travelled  by  each  node. 
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of  skewing  the  cost  function  towards  a  lower  number  of  optimal  hops.  Adding  a 
second,  third,  or  fourth  AUV  to  the  water  also  increases  the  complexity  of  an  expe¬ 
dition  significantly,  increasing  the  risk  of  losing  any  single  vehicle.  Even  without 
establishing  the  actual  value  of  the  per-node  cost,  it  becomes  clear  that  operating 
in  hazardous  environments  with  significant  external  per-node  costs  will  result  in 
a  lower  theoretic  optimum  for  the  relay  network  size.  While  Stojanovic  shows  that 
nine  hops  is  optimal  for  a  link  distance  of  nearly  50  kilometers[110],  these  results 
with  alternative  cost  functions  indicate  that  communication  over  distances  of  up 
to  80  kilometers  may  be  optimally  performed  in  less  than  nine  hops.  This  formu¬ 
lation  assumes  that  no  power  is  spent  to  receive  data,  only  to  transmit.  Zorzi  et 
al[137]  show  that  in  the  specific  case  of  the  WHOI  Micro-Modem,  the  energy  used 
when  receiving  packets  becomes  a  dominant  factor  in  total  energy  consumption  af¬ 
ter  only  a  few  hops.  To  communicate  over  50  kilometers,  their  results  suggest  that 
only  four  nodes  is  optimal. 


2.2  Relaying  with  CAPTURE 


Figure  2-11:  A  three-hop  network,  labeled  with  the  names  used  in  this  thesis. 


The  goal  of  a  multi-hop  relay  system  is  to  transmit  data  from  the  origin  to  the 
endpoint,  across  one  or  more  'hops',  as  in  Fig.  2-11.  As  even  compressed  imagery 
data  will  easily  dwarf  the  maximum  transmission  size  of  most  acoustic  modems, 
the  data  must  be  fragmented  into  pieces,  or  "segments",  prior  to  being  relayed 
across  the  multi-hop  chain.  It  is  tempting  to  consider  this  relaying  as  a  simple  ex¬ 
tension  of  the  single-hop  case;  each  segment  needs  to  be  communicated  to  the  next 
node  in  the  relay  chain,  which  is  then  responsible  for  delivering  the  segment  further 
down  the  chain.  While  this  approach  has  been  used  successfully  by  SeaWEB[90] 
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and  in  other  networking  systems,  it  suffers  from  drawbacks  which  we  now  discuss 
in  turn. 

First,  the  ocean  is  shared  by  all  communicating  nodes.  Although  individual 
vehicles  may  be  unable  to  receive  specific  messages,  underwater  acoustic  commu¬ 
nication  is  broadcast  in  nature  -  transmitted  messages  will  be  heard  by  all  vehicles 
within  transmission  range.  This  can  be  exploited  to  improve  the  throughput  of  the 
relay  -  if  the  origin's  message  is  heard  by  the  endpoint,  there  is  no  need  for  a  relay 
AUV  to  repeat  the  message. 

Second,  if  a  relay  AUV  leaves  the  network,  it  will  not  be  possible  to  forward  any 
segments  which  have  been  successfully  delivered  to  that  vehicle  but  not  forwarded 
beyond  it  until  that  vehicle  re-enters  the  network.  This  is  particularly  unfortunate 
given  the  ever-changing  underwater  environment,  which  can  frequently  cause  such 
disruptions. 

Third,  simple  relay  communication  systems  do  not  take  advantage  of  the  spe¬ 
cific  capabilities  of  AUVs.  Unlike  a  seafloor  node  on  a  long-term  deployment,  AUV 
are  equipped  with  large  capacity  batteries,  the  capability  for  significant  computa¬ 
tion,  and  large  amounts  of  data  storage.  This  chapter  therefore  proposes  a  relay 
approach  better  suited  to  the  unique  capabilities  and  limitations  of  AUVs.  The 
specific  characteristics  of  this  approach  are  now  discussed  in  turn. 

2.2.1  Store  and  Forward 

Terrestrial  networking  commonly  relies  on  the  capability  to  rapidly  forward  seg¬ 
ments  from  one  network  node  to  another  node,  dropping  those  segments  which 
caimot  be  immediately  forwarded  due  to  a  bad  link.  If  the  segments  are  important, 
they  must  be  retransmitted  by  the  origin  again.  This  is  impractical  in  an  underwa¬ 
ter  relay  link,  given  the  high  probability  of  at  least  one  hop  failing.  Early  research 
into  Disruption-Tolerant  Networking[34]  suggests  that  employing  a  "store  and  for¬ 
ward"  approach,  where  relays  store  data  until  passing  it  off  to  another  relay,  can  sig¬ 
nificantly  improve  the  performance  of  high  latency  and  intermittently-cormected 
networks.  An  implementation  of  this  strategy,  the  Bundle  Protocol,  is  now  being 
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pursued  under  the  auspices  of  the  Internet  Society's  Delay  Tolerant  Networking  Re¬ 
search  Group  (DTNRG)[102].  In  that  work,  responsibility  for  eventual  data  trans¬ 
mission  can  be  conveyed  from  one  node  to  another  node,  which  is  then  obligated 
to  deliver  the  data  at  all  costs.  The  original  node  can  then  delete  the  stored  data  to 
free  up  storage. 

While  storage  is  a  relevant  concern  on  space  vehicles,  which  maybe  deployed  in¬ 
definitely,  it  is  less  of  a  concern  for  underwafer  vehicles.  Relative  to  the  bandwidth 
available  with  modern  acoustic  modems,  AUVs  can  be  considered  to  have  nearly 
infinite  storage.  Ten  AUVs  communicating  constantly  at  a  generous  throughput  of 
10kbps  for  one  month  would  have  exchanged  only  about  thirty  gigabytes,  easily 
capable  of  fitting  on  a  small  and  cheap  flash  drive.  GAPTURE  nodes  exploit  this 
capability  mismatch  by  having  every  node  in  the  network  permanently  store  each 
piece  of  data  that  it  overhears,  regardless  of  the  transmitter. 

2.2.2  Broadcast  Medium 

In  most  networks,  including  the  space  networks  targeted  by  the  DTNRG,  transmis¬ 
sions  are  relayed  from  a  single  network  node  to  another  single  node.  Underwater, 
all  transmissions  are  broadcast  in  nature.  Rather  than  focusing  on  relaying  a  spe¬ 
cific  segment  of  data  from  the  first  hop  to  the  second  hop,  GAPTURE  nodes  track 
which  segments  are  known  to  be  possessed  by  any  vehicle  closer  to  the  endpoint. 
Segments  which  are  not  known  to  be  possessed  by  downstream  vehicles  are  then 
transmitted.  GAPTURE  encodes  enough  metadata  to  uniquely  identify  every  seg¬ 
ment  of  dafa  thaf  if  fransmits.  This  allows  any  receiver  to  fully  decode  any  received 
segmenf  of  data,  regardless  of  whether  it  has  previously  received  any  information 
about  the  resource  it  belongs  to.  If  a  receiver  is  downstream  from  the  intended 
recipient,  it  may  be  uimecessary  for  the  intended  recipient  to  ever  transmit  that 
segment. 
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2.2.3  Selective  Acknowledgement 

While  the  use  of  multiple  relay  vehicles  increases  the  overall  efficiency  of  an  acous¬ 
tic  link,  it  also  introduces  additional  challenges.  Even  the  simple  three-hop  network 
shown  in  Fig.  2-11,  if  spread  across  fwenty  kilometers,  would  have  an  end-to-end 
latency  from  the  origin  to  the  endpoint  of  tens  of  seconds.  Since  vehicles  move  rel¬ 
ative  to  each  other,  there  may  be  long  periods  without  an  end-to-end  path  through 
the  network.  Were  the  origin  to  wait  for  confirmation  that  the  endpoint  had  re¬ 
ceived  a  data  segment  before  moving  onto  a  new  piece  of  data,  end-to-end  trans¬ 
mission  would  slow  to  a  crawl.  Even  waiting  for  acknowledgement  of  reception 
from  fhe  next  node  in  the  chain  halves  throughput,  since  each  transmission  must 
take  twice  as  long  for  the  round-trip  acknowledgement. 

Selective  acknowledgement  is  a  well  established  technique,  and  has  even  been 
added  to  mainstream  protocols  like  Transmission  Control  Protocol  (TCP).  Instead 
of  transmitting  an  acknowledgement  after  every  received  segment  of  data,  a  single 
acknowledgement  is  sent  at  some  future  time  that  allows  the  transmitter  to  identify 
which  segments  were  successfully  received  and  which  were  not.  Drawing  inspira¬ 
tion  from  peer-to-peer  file  sharing  services  and  the  work  of  Wiemaim  ef  al.[128], 
CAPTURE  nodes  acknowledge  not  only  the  segments  that  they  possess  but  also  a 
list  of  all  the  segments  other  nodes  report  to  have  possessed,  by  node.  This  epi¬ 
demic  routing  of  the  segment  masks  would  not  be  practical  for  large  numbers  of 
nodes,  but  is  possible  for  these  small  relay  networks.  Keeping  track  of  which  seg¬ 
ments  have  been  received  locally,  and  which  are  known  to  be  possessed  by  a  down¬ 
stream  node,  also  aids  in  prioritizing  segments  for  transmission.  If  a  downstream 
node  is  disabled  before  successfully  passing  on  segments  it  possesses,  other  nodes 
can  easily  identify  which  pieces  remain  fo  be  relayed  and  fill  them  in.  This  stands 
in  contrast  to  networks  that  hand  off  delivery  responsibilify. 
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2.2.4  Routing  and  disruption  tolerance 

While  this  thesis  primarily  provides  an  approach  to  returning  data  from  a  single 
vehicle  by  way  of  multiple  relay  'hops',  it  is  desirable  that  the  identity  of  the  origin 
vehicle  be  able  to  change  over  the  course  of  a  dive.  In  addifion,  relay  vehicles  may 
become  disabled  and  unable  to  perform  their  duties.  It  is  important,  therefore, 
that  there  exists  a  method  for  specifying  which  vehicle  is  the  origin,  and  what  the 
sequence  of  vehicle  'hops'  is  that  will  convey  data  to  the  surface  endpoint. 

In  large  networks,  routing  tables  typically  proscribe  the  ideal  path  through  a 
network  from  one  node  to  any  other  node.  In  the  case  of  a  large  mobile  network, 
determining  these  tables  presents  a  significant  challenge.  Ad-hoc  routing  meth¬ 
ods  designed  for  routing  in  networks  of  unknown  coimectivity  such  as  AODV[81] 
would  seem  an  ideal  fit,  yet  high  latencies  make  on-demand  route  discovery  chal¬ 
lenging.  In  an  underwater  network  of  AUVs,  surface  operators  frequently  have 
out-of-band  information,  such  as  vehicle  locations  and  future  mission  plans,  that 
may  inform  selection  of  an  appropriate  route.  Rather  than  nodes  attempting  to 
learn  routing  information  independently,  1  propose  that  surface  operators  are  best 
equipped  to  identify  which  vehicle  should  transmit  as  the  'origin',  and  which  vehi¬ 
cles  are  most  appropriate  to  aid  in  relay  communication.  For  networks  of  less  than 
eight  vehicles,  including  this  data  in  each  packet  consumes  a  very  small  number  of 
bits.  CAPTURE  therefore  includes  such  information  in  every  acknowledgement. 

2.3  Comparison  of  Performance 

To  illustrate  the  benefits  of  an  archifecfure  like  CAPTURE  which  incorporates  these 
techniques,  a  set  of  network  simulations  were  run.  Three  protocols  were  imple¬ 
mented  in  Python,  and  simulated  under  a  variety  of  conditions.  The  first  protocol 
is  a  basic  node-wise  acknowledgement  protocol,  which  requires  each  segment  to 
be  successfully  received  by  the  next  node  before  accepting  an  additional  segment. 
This  protocol  would  clearly  be  expected  to  perform  poorly,  though  it  has  the  ben- 
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Range 


Figure  2-12:  Frame  Error  Rate  (FER)  versus  distance  used  in  the  simulations  below. 


efits  of:  being  similar  to  that  which  is  currently  done  in  practice,  being  simple  to 
implement,  and  using  no  portion  of  the  transmissions  for  metadata. 

The  second  protocol  implements  selective  acknowledgement  without  the  ad¬ 
ditional  improvements  incorporated  into  CAPTURE.  After  transmitting  six  data 
segments,  the  relay  nodes  would  transmit  a  segment  mask  of  all  the  segments  they 
had  received.  The  origin  and  endpoint  transmitted  only  chunks  and  segment  lists, 
respectively. 


Figure  2-13:  CAPTURE'S  performance  versus  simpler  protocols.  Note  that  the  Y 
axis  indicates  contiguous  bytes  received,  starting  with  the  first. 


For  the  purposes  of  the  simulation,  a  fixed  frame  size  of  64  bytes  per  transmis- 
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sion  was  used,  along  with  the  PER  probabilities  shown  in  2-12.  The  simulated  relay 
link  consisted  of  four  vehicles  spaced  two  kilometers  apart  over  six  kilometers  was 
simulated,  with  a  fixed  TDM  cycle.  The  TDM  cycle  provided  for  the  origin  and  two 
hops  to  each  transmit  three  times  for  each  single  cycle  of  the  endpoint.  Since  the 
endpoint  has  no  data  to  transmit,  it  need  not  communicate  as  frequently.  A  single 
cycle  of  the  CAPTURE  and  selective  acknowledgement  protocols  was  assumed  to 
consume  5  seconds,  whereas  a  single  cycle  of  the  node-wise  acknowledgement  pro¬ 
tocol  was  assumed  to  take  8  seconds  to  account  for  a  required  immediate  return  ac¬ 
knowledgement.  Ten  hours  of  transmission  were  simulated,  with  the  middle  three 
hours  shown  in  Pig.  2-13.  After  successfully  transmitting  a  minimum  of  1600  bytes, 
each  node  began  transmitting  a  new  artificial  data  source.  Por  the  simulation  run 
illustrated,  the  number  of  preview  images  successfully  received  within  ten  hours 
is  shown  in  Table  2.1. 


Protocol 

Previews  Received 

Node-wise  Acknowledgement 

8 

Selective  Acknowledgement 

28 

CAPTURE 

40 

Table  2.1:  Number  of  preview-sized  'images'  received  over  the  course  of  a  10  hour 
simulation,  consisting  of  a  three-hop  (four  vehicle)  network. 

The  simplest  protocol,  node-wise  acknowledgement,  performed  quife  poorly  as 
expecfed.  Both  CAPTURE  and  the  selective  acknowledgement  protocol  show  sig¬ 
nificant  non-linearities  in  the  progress  of  each  image  preview  -  these  nonlinearities 
occur  when  a  missing  segment  is  received  that  coimects  a  large  number  of  received 
segments  to  the  first  segments.  The  performance  of  the  network  is  closely  tied  to 
the  PER  for  each  hop  in  the  network,  which  is  closely  tied  to  the  length  of  the  hop. 
Pig.  2-15  shows  the  results  of  ruiming  the  same  simulation  several  times  for  a  sim¬ 
pler  fwo-hop  nefwork.  The  x-axis  is  the  distance  between  the  origin  and  relay  in 
the  relay  chain,  and  the  y-axis  represents  the  distance  between  the  relay  and  the 
endpoint. 
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Hop  1  Distance  (km) 


Figure  2-14;  Results  of  simulated  transmission  across  two  hops  using  CAPTURE, 
by  distance.  The  color  represents  the  number  of  1600  byte  preview  images  received 
during  a  simulation  of  a  twelve-hour  mission. 
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(a)  2  hops,  repeating  until  acknowledged  (b)  2  hops  using  selective  acknowledgement 


(c)  log  ratio  (d)  log  ratio 

Figure  2-15:  Results  of  simulated  transmission  across  two  hops,  comparing  CAP¬ 
TURE  and  simpler  protocols.  After  simulated  transmission  for  twelve  hours,  the 
number  of  successfully  received  1600  byte  image  thumbnails  was  compared  be¬ 
tween  the  two  approaches.  At  bottom,  the  ratio  of  CAPTURE'S  performance  to  that 
of  the  simpler  protocols  are  shown  on  a  logarithmic  scale.  For  very  low  proba¬ 
bilities  of  frame  error  (<  5%,  the  low  overhead  of  the  selective  acknowledgement 
protocol  allows  it  to  outperform  CAPTURE  by  nearly  10%.  Otherwise,  CAPTURE 
significantly  outperforms  both  alternatives. 
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2.4  Humans  in  the  loop 


While  it  is  mathematically  satisfying  to  consider  metrics  such  as  chaimel  capacity 
and  throughput,  they  do  not  fully  capture  the  utility  of  the  data  transmitted  to  the 
surface.  What  is  transmitted  is  just  as  important  as  how  effectively  it  is  transmitted. 
Autonomous  robots  are  used  in  both  exploration  and  emergency  response.  While 
it's  no  doubt  possible  to  codify  into  an  algorithm  the  appropriate  search  method 
for  a  submerged  oil  plume,  there  simply  is  not  time  to  prepare  such  complicated 
behaviors  in  the  wake  of  disasters  like  the  Deepwater  Horizon  spill.  Involving  hu¬ 
man  operators  in  the  selection  and  prioritization  of  telemetry  increases  the  overall 
value  of  the  telemetry  just  as  much  as  increasing  the  throughput.  In  the  next  chap¬ 
ter,  1  present  compression  methods  that  both  encode  data  efficiently  to  make  use 
of  the  limited  throughput,  and  increase  the  overall  efficiency  of  AUV  telemetry  by 
incorporating  user  feedback. 
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CHAPTER  3 


Data  Coding 


This  chapter  outlines  two  key  characteristics  of  the  telemetry  compression  algo¬ 
rithms  used  by  CAPTURE — efficient  bandwidth  usage  and  progressive  encoding — 
and  discusses  their  importance  to  operating  in  underwater  environments.  Methods 
for  compressing  fypical  AUV  dafa  subjecf  fo  those  constraints  are  then  presented. 
Results  of  applying  these  methods  to  both  imagery  and  scalar  environmental  data 
are  compared  against  current  approaches  using  data  collected  during  AUV  mis¬ 
sions.  I  propose  a  new  technique  for  compression  whereby  wavelef  coefficients  are 
pre-scaled  with  a  weighting  function,  prior  to  quantization.  This  enables  trans¬ 
mission  of  greafer  detail  in  the  most  important  areas  of  a  signal  while  minimizing 
the  number  of  bifs  used  elsewhere  in  fhat  signal.  Wavelet  compressors  are  highly 
efficient  at  encoding  intra-image  redundancy,  having  amongst  the  highest  known 
compression  ratios  on  single  images.  For  images  which  we  have  prior  information 
about,  or  sequences  of  repetitive  imagery,  it  seems  beneficial  fo  seek  an  algorifhm 
that  makes  use  of  significanf  inter-image  redundancy  as  well.  I  present  such  an 
algorithm,  which  relies  on  texture  segmentation,  classification,  and  synthesis  for 
image  compression.  We  sfart  with  a  brief  review  of  the  current  state  of  the  art. 

3.1  Background 

AUV  missions  primarily  call  for  collecfing  fwo  forms  of  data:  readings  from  scalar 
environmental  sensors,  and  sonar  or  optical  imagery.  Over  the  course  of  a  dive,  an 
AUV  could  easily  collect  one  million  samples  of  scalar  environmenfal  dafa,  such 
as  wafer  femperafures  or  mefhane  concenfrafions[16].  In  addition  fo  fhaf  dafa. 
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SeaBED  AUVs  capture  color  photographs  every  3  seconds  at  1360  x  1024  resolu¬ 
tion,  and  36  bits  per  pixel,  from  each  of  up  to  four  cameras  -  tens  of  thousands  of 
image  per  dive.  Transmitting  a  single  one  of  these  images  would  take  nearly  two 
days  at  a  sustained  (and  optimistic)  rate  of  300  bits  per  second.  Getting  every  bit  of 
the  collected  data  to  the  surface  during  a  mission  is  currently  impossible. 

3.1.1  Scalar  Environmental  Data 


Figure  3-1:  Sample  scalar  environmental  data.  Temperature  data  was  collected  over 
an  archaeological  site  near  Santa  Barbara,  California  using  a  SeaBED  AUV  The  re¬ 
duction  potential  data  was  collected  as  part  of  the  Arctic  Gakkel  Vent  Expedition 
[60],  and  provided  by  Dr.  Koichi  Nakamura. 

Modern  AUVs  commonly  transmit  a  predefined  set  of  state  data  to  the  surface, 
such  as  the  vehicle  position,  depth,  battery  life,  heading  and  similar  status  infor¬ 
mation.  Some  augment  these  transmissions  with  a  small  number  of  environmental 
data  samples,  though  most  environmental  data  is  trapped  on  the  vehicle  until  after 
recovery.  Fig.  3-1  shows  typical  Temperature  and  Reduction  Potential  (Eh)  data  ac¬ 
quired  during  two  SeaBED  AUV  dives.  While  temperature  varies  throughout  the 
mission.  Eh  remains  relatively  constant  except  for  brief  periods  of  activity. 
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The  diversity  of  AUV  missions  has  led  to  a  variety  of  custom  approaches  to  en¬ 
coding  and  decoding  these  vehicle  status  messages.  Many  of  these  solutions  are 
based  on  the  CCL[111]  standard  for  acoustic  communication,  which  provides  a 
number  of  standard  algorithms  for  encoding  256-bit  messages  containing  depth, 
latitude,  bathymetry,  altitude,  salinity,  and  other  data.  The  messages  are  designed 
to  communicate  only  current  information  about  the  vehicle.  If  the  communication 
link  is  temporarily  not  functioning,  no  data  about  the  vehicle  state  during  that  time 
would  later  be  communicated  to  the  surface.  CCL  also  relies  upon  quantization 
alone  to  provide  compression,  for  instance,  using  only  256  discrete  values  to  head¬ 
ing  with  an  encoded  precision  of  1.4  degrees.  While  this  reduces  the  number  of 
bytes  used,  it  makes  no  use  of  the  inherent  correlation  between  successive  head¬ 
ing,  temperature,  or  salinity  measurements  in  oversampled  data.  Recognizing  that 
many  of  the  measurements  are  oversampled,  Eastwood  et  al.  proposed  predic¬ 
tive  coding  methods  that  improved  the  performance  of  CCL[28].  Schneider  and 
Schmidt  incorporate  predictive  coding  into  their  recent  work  with  Dynamic  Com¬ 
pact  Control  Language  (DCCL)[99],  sending  up  a  mean  value  followed  by  smaller, 
quantized,  difference  values.  For  time-series  data  with  significant  redundancy, 
such  as  oversampled  time-series  data,  transform  compression  allows  much  higher 
efficiency. 

Transform  compression  methods  typically  follow  a  standard  pattern.  First,  a 
source  coder  such  as  the  Discrete  Cosine  Transform  (DCT)  or  Discrete  Wavelet 
Transform  (DWT)  exploits  the  inherent  correlation  within  most  data,  and  concen¬ 
trates  the  energy  of  the  signal  into  a  sparse  set  of  coefficients.  Effective  source  en¬ 
coders  concentrate  most  of  the  energy  of  the  original  signal  into  a  smaller  number 
of  coefficients.  These  coefficients  will  no  longer  be  correlated  across  different  input 
sequences,  as  they  can  otherwise  be  compressed  further  [96].  Next,  this  smaller 
set  of  significant  coefficients  is  encoded  in  a  way  that  allows  reconstruction  of  an 
approximation  to  those  coefficients[94].  The  process  is  simply  reversed  to  decode 
an  approximation  to  the  original  data.  Interestingly,  many  transform  compression 
methods  can  be  used  for  both  one  and  two  dimensional  data,  simply  by  using  the 
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appropriate  form  of  the  source  coder. 


3.1.2  Imagery 


There  has  been  significant  development  of  methods  for  the  transmission  of  still[115] 
and  video[80]  imagery  over  relatively  high  bandwidth  (~l-10kbps)  acoustic  teth¬ 
ers  operating  vertically  Early  efforts  employed  the  widely  used  JPEG  image  com¬ 
pression  standard.  JPEG  performs  transform  compression  using  the  DGT,  and  a 
fixed  quantization  table  for  a  pre-chosen  quality.  Graig  Sayers,  and  others  at  the 
University  of  Permsylvania,  developed  techniques  for  selecting  specific  frames  and 
'regions  of  interest'  from  a  video  sequence  that  best  describe  an  ROV  manipulator 
and  environment  state,  and  transmitted  these  regions  to  surface  operators  over  a  10 
kbps  acoustic  tether  as  JPEG  images  [97].  There  are  fewer  examples  of  free-ranging 
AUVs  telemetering  imagery.  In  one  SeaWEB[90]  experiment,  fixed  seafloor  nodes 
were  used  to  relay  a  small  number  of  images.  Unfortunately,  JPEG  performs  quite 
poorly  at  the  high  compression  ratios  needed  for  acoustic  telemetry.  Eastwood  et  al. 
evaluated  the  performance  of  an  early  wavelet-based  compressor,  EPIG,  and  found 
that  it  had  benefits  at  low  bitrates  relative  to  JPEG  [28].  In  addition,  there  has  been 
some  previous  study  indicating  wavelet  compression  techniques  are  particularly 
applicable  to  underwater  images,  video,  and  acoustic  imagery[44, 46, 47]. 


Image  Statistics 


.01  1  100  10,000  1,000,000+ 

Bytes  per  Image 


Eigure  3-2;  The  spectrum  of  compression  options  for  imagery,  as  developed  in  this 
thesis.  Note  that  the  options  span  many  orders  of  magnitude. 


To  compress  imagery  to  a  size  appropriate  for  acoustic  transmission,  a  few  hun¬ 
dred  to  few  thousand  bytes,  requires  very  high  compression  ratios.  While  a  typical 
compression  ratio  for  a  JPEG  intended  for  human-viewing  might  be  10:1  or  30:1, 
converting  a  one  megapixel  color  image  to  a  few  kilobytes  implies  a  compression 
ratio  of  1000:1  to  3000:1  -  two  orders  of  magnitude  higher.  This  necessitates  the 
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analysis  and  use  of  less  common  compression  techniques.  In  this  chapter,  I  present 
a  range  of  options  for  communicating  imagery,  ranging  from  summarizing  an  en¬ 
tire  dataset  as  a  time  series  to  encoding  individual  images  using  wavelet  compres¬ 
sion,  as  shown  in  Fig.  3-2.  I  also  present  a  novel  method  for  image  compression 
based  on  texture  synthesis  and  texture  classification.  This  method,  nicknamed  Im¬ 
age  Synthesis,  fits  between  transmission  of  individual  images  and  summary  dataset 
statistics. 

3.1.3  Discrete  Wavelet  Transform  (DWT) 

Transform  compression  using  the  DWT  as  a  source  coder,  typically  referred  to  as 
wavelet  compression,  has  been  found  effective  on  a  variety  of  real-world  signals  and 
imagery  [14].  The  DWT,  a  linear  transform,  is  now  widely  used  as  a  source  encoder 
for  imagery  and  biomedical  data.  The  DWT  is  calculated  by  applying  a  low-pass 
filter  to  the  input  signal,  generating  one  set  of  coefficients,  and  then  applying  a  high- 
pass  filter  to  the  input  signal  to  generate  a  second  set  of  coefficients.  Both  sets  of 
coefficients  are  downsampled  by  two,  resulting  in  the  same  number  of  coefficients 
as  the  original  input  signal  had  samples.  Calculating  the  DWT  of  a  signal  thus 
results  in  two  distinct  sets  of  coefficients;  a  decimated  version  of  the  signal  known 
as  the  'approximation  coefficients',  and  a  set  of  'detail  coefficients'  which  contain 
the  higher-frequency  information  lost  during  decimation.  Fig.  3-3  shows  the  full 
wavelet  decomposition  of  a  short  one-dimensional  signal  of  32  samples. 

The  DWT  is  typically  (as  in  Fig.  3-3)  applied  recursively  to  the  approximation 
coefficients,  generating  several  levels  of  detail  coefficients;  each  level  of  detail  coef¬ 
ficients  then  represents  the  detail  lost  by  decimation  at  that  iteration  of  the  trans¬ 
form.  Each  detail  coefficient  in  the  resulting  set  is  localized  in  time  as  well  as  being 
associated  with  a  'scale',  or  level  of  detail.  The  detail  coefficients  will  generally  be 
low  in  magnitude,  except  near  areas  of  change  for  a  given  scale.  This  sparsity  facil¬ 
itates  efficiently  compressing  the  data.  For  a  well-written  introduction  to  wavelets, 
DeVore  and  Lucier  provide  an  excellent  reference  [24].  As  the  DWT  is  separable, 
multi-dimensional  data  can  be  transformed  a  single  dimension  at  a  time,  following 
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Level  Coefficients  Level  Contribution  Cumulative  Reconstruction 


Figure  3-3:  Wavelet  coefficient  magnitude  is  shown  by  the  stem  plots  at  left.  The 
middle  column  indicates  the  sum  of  the  inverse  transformed  wavelets  at  that  level 
of  detail.  By  cumulatively  summing  the  levels  (right  column),  increasingly  detailed 
approximations  to  the  original  signal  are  produced  until  the  original  signal  is  re¬ 
covered  at  the  bottom  right. 

the  same  procedure. 


3.1.4  Embedded  Wavelet  Coding 

Progressive  coding  methods  allow  the  reconstruction  of  intermediate  data  repre¬ 
sentations  at  one  or  more  'checkpoints'  within  an  encoded  data  stream.  Fully  Em¬ 
bedded  coding  methods  have  the  additional  property  that  they  do  not  require  tar¬ 
getting  any  specific  image  'quality'  or  final  size.  Specifically,  if  data  is  compressed 
twice  with  a  fully  embedded  encoder,  to  sizes  M  and  N,  with  M  >  N,  then  the  first 
N  bits  are  identical  in  both  files.  This  makes  fully  embedded  coding  methods  well 
suited  to  the  underwater  environment  where  computation  ability  is  limited,  com¬ 
munication  is  packetized,  and  transmission  rates  can  vary  from  packet  to  packet,  as 
it  allows  compression  to  be  performed  independent  of  the  target  transmission  rate. 
Messages  sent  to  nearby  AUVs  for  multiple  vehicle  collaboration  could  be  sent  at  a 
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higher  rate,  and  those  destined  for  a  surface  ship  or  transmission  over  longer  dis¬ 
tances  can  be  sent  at  a  more  conservative  rate  without  any  need  for  recompression 
of  the  data.  Low-fidelity  color  image  thumbnails,  transmitted  at  rates  as  low  as  a 
few  hundred  bits  per  image,  can  later  be  used  as  a  basis  for  more  refined  versions 
of  the  same  image.  If  the  entire  bitstream  is  sent,  the  compression  process  is  en¬ 
tirely  reversible  and  results  in  the  original  data  with  no  loss  of  precision.  Combined 
with  the  success  of  wavelet-based  analysis  techniques  in  the  underwater  domain, 
this  suggests  underwater  AUV  networking  can  greatly  benefit  from  the  use  of  fully 
embedded  wavelet  compression. 

The  Embedded  Zerotree  of  Wavelets  (EZW)[103]  algorithm  is  one  early  exam¬ 
ple,  which  led  to  the  more  efficient  Set  Partitioning  in  Hierarchical  Trees  (SPIHT)[95] 
coding  method,  and  derivatives [120,  125].  Each  of  these  compression  algorithms 
follows  a  similar  process  of  three  main  steps.  Eirst,  the  DWT  is  applied  to  the  data, 
resulting  in  a  set  of  coefficients  in  the  wavelet  domain.  Second,  these  (typically 
floating-point)  coefficients  are  requantized  as  signed  fixed  point  numbers.  Einally, 
this  fixed-point  representation  is  encoded  in  an  algorithm-specific  way,  which  re¬ 
sults  in  a  sequence  of  bits.  Any  truncated  portion  of  this  bitstream  can  be  decoded 
into  a  signed  fixed-point  approximation  to  the  wavelet  coefficients,  after  which  the 
Inverse  DWT  restores  an  approximation  to  the  original  data.  Each  algorithm  can 
be  used  effectively  on  scalar  data,  imagery,  or  even  3D  volumetric  data.  Eor  clarity, 
I  discuss  the  one  dimensional  approach  first,  and  then  extend  to  imagery. 

3.1.5  Image  Synthesis  using  Texture  Patches 

Natural  photographs  exhibit  intra-image  redundancy,  including  smooth-varying 
colors  and  luminance.  Embedded  wavelet  coding  relies  on  this  redundancy  to 
transmit  a  facsimile  of  the  image  in  fewer  bytes  than  the  uncompressed  image 
would  consume.  With  respect  to  its  use  for  underwater  image  telemetry,  there  are 
two  aspects  of  embedded  wavelet  compression  methods  that  merit  closer  scrutiny. 
Eirst,  fine  texture  details  are  quickly  lost  at  high  compression  ratios,  due  to  the 
smoothing  effects  of  the  wavelet  compression.  This  is  undesirable  if  the  informa- 
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tion  surface  operators  hope  to  extract  relies  on  the  fine-scale  texture  of  the  image. 
Brown  rocks  and  brown  coral  appear  quite  similar  after  smoothing  out  fine-scale 
details,  for  example,  yet  differentiating  between  classes  may  be  important  to  surface 
operators.  Second,  when  taking  multiple  photographs  of  a  single  area,  there  will 
be  significant  redundancy  not  only  within  each  image,  but  across  the  set  of  images. 
Wavelet  compressors  are  highly  efficient  at  encoding  intra-image  redundancy,  yet 
make  no  use  of  this  inter-image  redundancy. 

Video  compression  techniques  do  make  use  of  recurrence  across  frames,  yet 
they  do  so  in  a  time-localized  maimer.  Rather  than  considering  every  frame  previ¬ 
ously  recorded,  they  consider  only  those  frames  within  immediate  time  proximity, 
and  assume  that  motion  is  a  dominant  cause  of  inter-frame  changes.  For  example,  if 
a  video  began  recording  an  outdoor  scene,  then  moved  indoors  to  a  different  scene, 
then  returned  to  the  same  outdoor  scene  again,  the  compression  of  each  outdoor 
scene  would  be  completely  independent.  Compression  of  the  second  outdoor  scene 
would  not  take  advantage  of  the  fact  that  this  set  of  imagery  is  highly  redundant  of 
the  first  outdoor  scene.  Each  video  image  is  only  compressed  relative  to  the  similar 
images  in  an  adjacent  time  period. 

When  AUVs  are  compressing  sequences  of  repetitive  static  imagery,  the  over¬ 
lap  of  sequential  frames  may  be  low  or  non-existent,  limiting  the  utility  of  motion- 
compensation  as  a  compression  technique.  However,  prior  information  may  be 
available  about  the  contents  of  the  images  in  terms  of  texture,  even  though  each 
individual  image  may  be  of  different  time  periods  and  locations  and  thus  vary  sig¬ 
nificantly.  Even  though  a  single  image  may  not  look  very  similar  to  the  previous 
image  overall,  the  image  may  feature  textures  and  objects  very  similar  to  those  seen 
in  prior  images.  Section  3.3  presents  a  compression  option,  nicknamed  Image  Syn¬ 
thesis,  which  utilizes  the  inter-frame  redundancy  of  underwater  data  to  provide 
extremely  high  compression  while  preserving  some  texture  information. 
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Figure  3-4:  A  wavelet  decomposition  at  upper  left,  followed  by  the  reconstruction 
from  increasingly  length  SPIHT  bitstreams.  As  the  number  of  bits  grows,  the  re¬ 
construction  is  closer  to  the  original  coefficients.  Coefficient  signs  have  not  been 
have  been  depicted. 

3.2  Fully  Embedded  Wavelet  Coding 

SPIFIT,  its  progenitor  EZW[103],  and  similar  algorithms,  treat  the  wavelet  decom¬ 
position  as  a  tree  of  coefficients,  rooted  at  the  lowest  level  approximation  coeffi¬ 
cients.  Many  real  signals  that  have  large  magnitude  coefficients  at  high  levels  also 
have  higher  magnitude  coefficients  at  lower  levels.  Fully  embedded  wavelet  coders 
exploit  this  cross-level  correlation.  SPIHT  does  this  via  a  clever  sorting  algorithm. 
As  the  authors  write  in  their  tutorial  on  the  topic  [78,  p95],  set  partition  coding 
...  is  a  procedure  that  recursively  splits  groups  of  [coefficients]  guided 
by  a  sequence  of  threshold  tests,  producing  groups  of  elemenfs  whose 
magnitudes  are  between  two  known  thresholds. 

A  SPIHT-encoded  bitstream  consists  of  a  sequence  of  refinement  bits  and  sorting 
bits,  interlaced  in  a  data-dependent  order.  Sorting  bits  provide  an  efficient  way 
to  identify  high  magnifude,  and  therefore  imporfanf,  wavelet  coefficients.  Refine¬ 
ment  bits  provide  a  continually  improving  estimate  for  the  magnitude  of  a  wavelef 
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coefficient.  In  particular,  sorting  bits  indicate: 

•  whether  a  coefficient  is  greater  in  magnitude  than  the  current  threshold,  or 
'significant', 

•  whether  any  descendant  in  the  wavelet  tree  of  the  currently  considered  coef¬ 
ficient  is  'significant',  and 

•  whether  any  grand-descendant  is  significant. 

Refinement  bits  indicate  either  the  sign  of  a  coefficient,  or  a  single  bit  of  a  coeffi¬ 
cient's  magnitude. 

Refinement  bits  provide  a  continually  improving  estimate  for  the  magnitude  of 
a  wavelet  coefficient.  Sorting  bits  provide  an  efficient  way  to  identify  high  magni¬ 
tude,  and  therefore  important,  wavelet  coefficients.  Fig.  3-4  shows  the  progressive 
reconstruction  of  a  small  set  of  coefficients  using  an  increasing  number  of  (indi¬ 
cated)  bits. 

3.2.1  Scalar  Environmental  Data 

Fig.  3-5  and  3-6  display  respective  approximations  for  the  original  scalar  temper¬ 
ature  and  Eh  data  of  Fig.  3-1  using  an  example  fully-embedded  wavelet  coder, 
SPIHT,  compared  to  the  more  traditional  approach  of  interpolating  quantized  sam¬ 
ples.  These  two  coders  are  compared  for  each  signal  at  three  different  encoding 
sizes:  28  bytes,  56  bytes,  and  112  bytes.  Paying  particular  attention  to  the  extrema  of 
each  signal,  the  SPIHT  encoded  signals  clearly  better  represent  both  original  signals 
than  the  spline-interpolation  at  all  three  byte  sizes.  An  additional  side-effect  of  the 
full-embedded  wavelet  coding  is  that  the  reconstructed  signal  has  been  de-noised; 
discarding  low-magnitude  coefficients  is  an  effective  form  of  noise  reduction  [124]. 

In  order  to  quantify  the  benefits  of  encoding  the  temperation  and  Eh  scalar  data 
with  SPIHT,  Fig.  3-7  displays  the  root  mean  squared  error  (RMS  error)  versus  sig¬ 
nal  size  for  SPIHT  and  two  interpolation  coding  methods.  Here  it  can  be  seen  that 
SPIHT  displays  significant  improvement  in  data  fidelity  across  a  wide  range  of 
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14  Spline-Interpolated  16-bit  Fixed  Point  Samples 


Figure  3-5:  SPIHT-encoded  scalar  temperature  data  at  different  levels  of  compres¬ 
sion,  compared  to  interpolating  quantized  samples.  The  original  data  is  shown  in 
gray  in  each  graph  while  the  black  lines  represent  the  various  approximations.  The 
approximations  are  grouped  into  three  sets  of  SPIHT  vs.  fixed-point  comparisons 
where  each  pair  is  encoded  with  the  same  number  of  bytes. 
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14  Spline-Interpolated  16-bit  Fixed  Point  Samples 


1 

9 

SPIHT  Encoded  with  28  Bytes 


Figure  3-6:  SPIHT-encoded  scalar  Eh  data  at  different  levels  of  compression,  com¬ 
pared  to  interpolating  quantized  samples.  The  same  original  data  is  shown  in  gray 
in  each  graph  while  the  black  lines  represent  the  various  approximations.  The  ap¬ 
proximations  are  grouped  into  three  sets  of  SPIHT  vs.  fixed-point  comparisons 
where  each  pair  is  encoded  with  the  same  number  of  bytes. 
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g  Bytes  per  Packet 


Figure  3-7;  Comparison  of  SPIHT  encoding  with  subsampling  methods  for  Tem¬ 
perature  data  (top)  and  reduction  potential  data  (bottom)  across  a  wide  range  of 
encoding  qualities. 


transmission  rates,  when  compared  to  simple  subsampling.  The  received  signal 
is  both  qualitatively  (Fig.  3-5  and  3-6),  and  quantitatively  (Fig.  3-7)  more  similar  to 
the  original  data  than  the  interpolated  data  points. 

3.2.2  Segmenting  Scalar  Data 

While  a  single  image  is  easy  to  consider  as  a  distinct  'resource',  transmitting  envi¬ 
ronmental  sensor  data  requires  identifying  a  section  of  data  to  transmit.  This  is  best 
done  by  breaking  a  time-series  into  large  chunks  of  data  -  for  correlated  time-series 
data,  compressing  a  few  samples  at  a  time  is  much  less  efficient  than  compressing 
long  sequences  simultaneously.  Fig.  3-8  shows  this  result  while  piecewise  com¬ 
pressing  a  long  series  of  temperature  data. 
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Figure  3-8:  This  figure  shows  the  reconstruction  error  (Y  axis,  in  RMSE)  versus 
the  compression  level  (X  axis,  in  bits)  for  the  two  hour  sequence  of  temperature 
data.  Each  plotted  line  shows  the  result  of  compressing  the  full  dataset,  but  doing 
so  by  different  length  subsets  of  the  data  at  a  time.  Since  the  original  temperature 
data  was  collected  at  four  Hertz,  compressing  8192  samples  at  a  time  would  be 
equivalent  to  transmitting  updated  data  every  34  minutes,  versus  every  30  seconds 
when  data  is  compressed  128  samples  at  a  time.  Encoding  more  samples  in  each 
transmission  lowers  the  reconstruction  error  for  any  given  compression  level. 


3.2.3  Spatially  Varied  Quantization 

Prior  to  being  coded  by  the  embedded  wavelet  coders  described  in  this  chapter, 
wavelet  coefficients  are  requantized  into  a  standard  sign-magnitude  representa¬ 
tion.  While  the  level  and  method  of  quantization  depend  on  the  dynamic  range 
of  the  time-series  data,  the  quantization  is  typically  constant  for  all  wavelet  coeffi¬ 
cients. 

Occasionally,  it  may  be  of  value  to  provide  higher  fidelity  to  certain  sections 
of  data.  Accenting  recent  data  would  allow  decisions  to  be  made  about  nearby 
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Figure  3-9:  Standard  SPIFIT  compared  to  time-varying  quantization.  Note  that 
time-varying  quantization  performs  better  on  the  more  recent  data. 

features  of  interest  before  they  are  left  far  behind.  Images  may  have  one  or  more 
regions  of  interest  that  warrant  a  higher  quality  encoding.  I  propose  that  this  can  be 
achieved  by  artificially  pre-scaling  the  wavelet  coefficients  using  a  weighting  func¬ 
tion,  prior  to  quantization.  As  wavelet  coders  prioritizes  higher  magnitude  coeffi¬ 
cients,  this  leads  to  greater  detail  being  conveyed  in  those  areas  of  the  reconstructed 
signal  at  the  cost  of  lower  detail  elsewhere.  The  receiver  must  also  know  the  cost 
function  so  that  the  inverse  weighting  can  be  applied  after  decoding  wavelet  coef¬ 
ficients.  This  strategy  has  been  employed  to  generate  Fig.  3-9;  wavelet  magnitudes 
were  artificially  prescaled  prior  to  encoding  them  with  SPIHT. 


(2^^  n  =  l 

I  2(16+5^  ;  x<n}  I 


(3.1) 
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The  weighting  function  used  in  Fig.  3-9  was  generated  using  the  logarithmically 
increasing  sequence  of  quanfization  coefficienfs  shown  in  Equation  3.1,  where  n  is 
the  number  of  coefficienfs.  This  resulfs  in  more  recenf  dafa  being  encoded  with 
higher  fidelity  than  older  data.  In  the  case  of  an  image,  the  coordinates  and  scale 
for  regions  of  inferesf  could  be  fransmiffed  along  wifh  the  encoded  image  data, 
which  would  be  used  to  derive  the  applied  weighting  function. 

3.2.4  Imagery 


(a)  Pillow  Lava  (Southern  Mid- 
Atlantic  Ridge) 


(c)  Airplane  (Santa  Barbara,  CA) 


(b)  Coral  Reef  (Puerto  Rico) 


(d)  Fish  and  Sand  (Santa  Barbara, 
CA) 


Figure  3-10:  Representative  imagery,  captured  by  the  SeaBED  AUV.  The  four  im¬ 
ages  shown  are  used  fo  illusfrafe  fhe  performance  of  SPIHT  on  fypical  underwafer 
imagery,  relative  fo  fhe  currenf  sfafe  of  the  art. 
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SPIHT  was  originally  designed  for  photo  compression,  and  can  be  used  on  high 
dimensional  datasets  as  well  as  scalar  data.  Two-dimensional  data  like  imagery  is 
simply  transformed  with  the  2D  form  of  the  DWT,  and  then  SPIHT  coded  follow¬ 
ing  a  similar  process  as  the  ID  version.  To  encode  color  images,  each  color  plane 
is  encoded  independently.  As  humans  are  more  sensitive  to  changes  in  luminosity 
than  changes  in  chromaticity,  encoding  in  either  the  YUV  or  Lab  color  space  simpli¬ 
fies  allocating  bits  to  the  most  important  data,  with  the  U  and  V  color  planes  being 
encoded  with  a  much  smaller  allowance  than  the  luminance  plane. 

Fig.  3-10  displays  four  images,  captured  by  the  SeaBED  AUV,  which  are  repre¬ 
sentative  of  the  types  of  images  desired  by  human  operators  during  an  AUV  mis¬ 
sion.  These  images  were  resampled  to  1024  x  1024  pixel  source  images  and  then 
coded  using  SPIHT,  JPEG  2000,  and  progressive  JPEG  at  three  different  levels  of 
quality.  Eig.  3-11  displays  the  reconstruction  error  versus  number  of  bytes  for  each 
of  the  images  using  each  coder.  The  JPEG  2000  data  has  visible  nonlinearities  indi¬ 
cating  discrete  quality  'checkpoints,'  while  the  errors  associated  with  SPIHT  follow 
a  smooth  reduction  curve  as  the  size  of  the  transmitted  file  increases.  We  also  see 
from  these  graphs  that  JPEG  is  largely  incapable  of  encoding  large  images  at  the 
low  sizes  available  through  SPIHT. 

Fig.  3-12  displays  the  same  metrics  as  Fig.  3-11  but  encodes  versions  of  the  im¬ 
ages  in  Fig.  3-10  that  have  been  resampled  to  a  smaller  size  of  256  x  256  pixels. 
Again,  there  are  visible  discrete  quality  'checkpoints'  evident  in  the  progressively 
coded  JPEG  2000  data,  while  SPIHT  coding  provides  a  smooth  compression  curve. 
JPEG  performs  adequately  at  larger  filesizes;  however  the  error  increases  substan¬ 
tially  at  lower  file  sizes  reached.  It  was  hypothesized  that  the  lower  quality  (smaller 
q)  JPEG  coders  would  perform  better  than  the  higher  quality  JPEG  coders  at  lower 
numbers  of  bits.  However,  somewhat  counter-intuitively,  the  higher-quality  JPEG 
compression  coders  typically  resulted  in  better  image  quality  (less  RMS  Error)  even 
for  small  target  filesizes.  JPEG  is  simply  not  suited  to  compression  at  these  ratios. 

The  mean  reconstruction  error  across  all  four  images  of  Fig.  3-10  is  displayed 
in  Fig.  3-13.  The  solid  lines  correspond  to  the  1024  x  1024  pixel  source  images  of 
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Filesize  in  Bytes 


(a)  Pillow  Lava 


w 


■ — ■  SPIHT 
■ — •  JP2K 
g  • — ■  JPEG,  q=10 
• — •  JPEG,  q=20 
—  JPEG,  q=40 

128  256  512  1024  2048  4096  8192  16384 


Filesize  in  Bytes 


(d)  Fish  and  Sand 


Figure  3-11:  Reconstruction  error  versus  number  of  bytes  for  the  four  representative 
images  shown  in  Fig.  3-10,  encoded  from  1024  x  1024  pixel  source  images.  Compar¬ 
ing  the  discrete  quality  'checkpoints'  visible  as  'bumps'  in  the  progressively  coded 
JPEG  2000  (JP2K)  data,  with  the  smooth  progression  of  the  SPIHT  coding,  we  learn 
that  JPEG  is  largely  incapable  of  encoding  large  images  at  these  low  sizes. 
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Filesize  in  Bytes 
(a)  Pillow  Lava 


128  256  512  1024  2048  4096  8192  16384 

Filesize  in  Bytes 

(b)  Coral  Reef 


128  256  512  1024  2048  4096  8192  16384 

Filesize  in  Bytes 


(c)  Airplane 


• — •  SPIHT 
■ — ■  JP2K 


■  ■  JPEG,  q=10 

-  JPEG,  q=20 


128  256  512  1024  2048  4096  8192  16384 

Filesize  in  Bytes 


(d)  Fish  and  Sand 


Figure  3-12:  Reconstruction  error  versus  number  of  bytes  for  the  four  images  shown 
in  Fig.  3-10,  encoded  from  small,  256  x  256,  pixel  source  images.  Discrete  quality 
'checkpoints'  are  very  evident  as  'bumps'  in  the  progressively  coded  JPEG  2000 
QP2K)  data,  while  SPIFIT  coding  provides  a  smooth  compression  curve.  JPEG 
performs  adequafely  af  larger  filesizes,  until  reaching  the  target  encoding  quality. 
Gounter-intuitively,  the  higher-quality  (larger  q)  JPEG  compression  typically  has 
better  image  quality  even  at  lower  filesizes. 
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Figure  3-13:  The  mean  reconstruction  error  across  all  four  images  from  Fig.  3-10. 
The  solid  lines  indicate  versions  of  the  images  that  were  1024  x  1024  pixels  in  size, 
the  dashed  lines  represent  images  that  are  256  x  256  pixels.  Beyond  the  benefits  of 
the  fully  embedded  coding,  SPIHT  clearly  provides  the  highest  compression  per¬ 
formance  for  both  small  and  large  imagery. 


Fig.  3-11,  while  the  dashed  lines  represent  the  256  x  256  pixel  source  images  of  Fig. 
3-12.  This  summary  chart  shows  that  beyond  the  aforementioned  benefits  of  fully 
embedded  coding,  SPIHT  clearly  provides  higher  compression  performance  than 
JPEG  or  JPEG  2000  for  both  small  and  large  imagery  target  sizes. 
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3.3  Image  Synthesis 


Image  Number  (Increasing  Time  -^) 


Figure  3-14:  Percentage  of  coral  cover  for  each  image.  Black  line  is  the  ll-point 
median-filtered  percentage,  computed  from  the  individual  samples.  Note  that  ar¬ 
eas  dominated  by  sand  and  rubble  both  correspond  to  areas  of  low  coral  cover. 

Exfensive  anthropogenic  damage  to  shallow-water  coral  has  been  well  docu¬ 
mented  [38]  [29],  yet  it  is  much  more  challenging  to  study  the  health  of  coral  reefs 
below  diver  depths.  AUVs  allow  scientists  to  not  only  reach  deeper  than  a  single 
diver,  but  cover  more  area  as  well  [4].  Unfortunately,  transmitting  imagery  at  high 
enough  quality  to  distinguish  healthy  coral  from  rubble  requires  significant  sub- 
sampling  of  the  captured  imagery,  as  fine-scale  texture  information  suffers  from 
significant  blurring  in  wavelet  compressed  imagery  at  high  compression. 

Wavelet  compressors  are  highly  efficient  at  encoding  intra-image  redundancy, 
having  amongst  the  highest  known  compression  ratios  on  single  images.  Given  the 
limited  bandwidth  available,  and  the  significant  quantity  of  imagery  that  AUVs 
capture,  it  would  be  nice  to  also  take  advantage  of  inter-frame  redundancy.  One 
way  to  do  that  is  to  compute  scalar  metrics  (e.g.  percent  coral  cover)  and  trans¬ 
mit  those  instead  of  the  individual  images.  Such  a  technique  allows  the  content 
of  imagery  fo  be  communicafed  much  faster  than  a  compressed  image,  describing 
entire  datasets  in  a  few  hundreds  of  bytes.  When  such  a  metric  is  the  goal  of  data 
acquisition  (e.g.  measuring  the  percentage  of  coral  cover),  this  scalar  metric  ap- 
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proach  is  an  extremely  efficient  way  of  communicating  information  about  imagery 
acquired  subsea.  Fig.  3-14  plots  the  percentage  of  each  image  covered  in  coral  for  a 
dataset  acquired  by  the  author  as  part  of  a  study  on  the  long-term  health  of  deep- 
sea  coral  reefs  near  Puerto  Rico.  Distinctive  changes  in  imagery  correspond  well 
with  changes  in  the  percentage  of  coral  cover,  as  illustrated. 

Still,  this  representation  sacrifices  information  about  the  distribution  of  coral 
within  these  specific  images,  and  about  the  character  of  each  image  more  generally. 
I  now  present  a  middle  ground  between  these  two  extremes  of  imagery  and  scalar 
image  statistics  nicknamed  Image  Synthesis. 

3.3.1  Image  Synthesis 

Image  Synthesis  employs  inter-image  repetition  in  texture  space  to  summarize  in¬ 
formation  about  the  contents  of  an  image,  allowing  an  extremely  compact  repre¬ 
sentation  to  be  communicated  while  still  allowing  estimations  of  scalar  texture- 
based  metrics  such  as  the  percentage  of  coral  cover  based  on  the  received  image. 
The  resulting  imagery  can  be  transmitted  in  much  less  space  than  the  compression 
techniques  described  previously  in  this  chapter,  consuming  only  tens  of  bytes  per 
image  on  average.  To  achieve  high  compression  ratios  I  employ  both  inter-image 
and  intra-image  redundancy  in  texture  space  by  describing  each  new  image  as  a  set 
of  previously  seen  image  texture  patches.  This  allows  the  surface  reproduction  of 
imagery  with  rich  textures,  at  the  cost  of  a  decreased  ability  to  communicate  previ¬ 
ously  unseen  imagery.  A  single  image  compressed  using  each  of  these  techniques 
is  shown  in  Fig.  3.1,  along  with  a  time-series  representation  of  the  percentage  coral 
cover.  Parts  (a),  (b),  and  (c)  of  this  figure  each  represent  embedded-wavelet  com¬ 
pressed  versions  of  the  image  at  different  sizes.  Part  (d)  represents  the  image  com¬ 
pressed  using  the  image  synthesis  method.  The  largest  SPIHT  image  (a)  clearly 
provides  a  better  representation  of  the  true  image  than  the  synthesized  image  (d). 
Comparing  the  synthesized  image  (d)  to  a  SPIHT  image  of  comparable  file  size  (c), 
the  image  synthesis  technique  is  more  appealing.  Additionally,  the  percentage  of 
coral  cover  could  be  trivially  computed  from  the  synthesized  image,  whereas  it  is 
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nearly  impossible  to  determine  from  the  comparably  sized  SPIHT  image.  Part  (e) 
shows  that  computing  the  percentage  of  coral  cover  from  synthesized  images  (blue 
line)  provide  a  reasonable  estimate  of  the  true  coral  cover  (black  line).  Note  that, 
alternatively,  the  percentage  of  coral  cover  for  every  image  could  be  transmitted  as 
a  scalar  time-series  (green  line). 

The  procedure  for  compressing  a  sequence  of  images  using  Image  Synthesis 
consists  of  four  steps,  depicted  in  Fig.  3.2.  First,  source  imagery  is  segmented  and 
classified  into  areas  of  similar  texture.  Second,  this  segmented  image  is  drastically 
subsampled,  resulting  in  a  low-resolution  map  of  texture  blocks  within  the  image. 
Third,  each  of  these  masks  are  encoded  in  a  non-sequential  but  deterministic  or¬ 
der,  using  an  arithmetic  coder.  Finally,  when  each  mask  is  received  on  the  surface, 
a  texture  synthesis  procedure  is  used  to  synthesize  an  image  similar  to  the  one 
compressed.  I  proceed  with  further  description  of  each  of  these  steps  in  turn. 

3.3.2  Image  Segmentation 

Segmentation  and  classification  of  seafloor  imagery  remains  an  active  research  topic 
in  underwater  robotics.  Pizarro,  Rigby  et  al.[82][83]  have  presented  results  ob¬ 
tained  with  a  'bag-of-features'  approach,  using  SIFT  descriptors  as  their  feature 
space.  Loomis  [62]  obtained  high  classification  ratios  using  only  5x5  patches  as  tex- 
tons,  and  by  developing  a  classifier  that  relied  on  boosting.  The  approach  pursued 
here  for  this  dataset  is  inspired  by,  though  different  from,  the  efforts  of  Kaeli  et 
al[56]  using  morphological  image  processing  on  a  similar  data  set.  I  document 
below  the  segmentation  and  classification  used  on  this  dataset  in  the  interest  of 
completeness,  though  the  recent  work  described  above  would  provide  a  more  ap¬ 
propriate  starting  point  for  future  implementers  seeking  a  flexible  implementation. 
The  segmentation  and  classification  procedure  described  here  results  in  a  classifi¬ 
cation  mask  containing  regions  in  one  of  four  classes:  sand,  rubble,  M.  Armularis 
coral,  and  unclassifiable  (such  as  shadows).  For  all  examples  in  this  section,  the 
dimensions  of  the  source  imagery  are  504  x  504  pixels. 

Each  image  is  processed  independently.  Initially  it  is  converted  to  an  eight-bit 
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(a)  SPIHT  (400x400  @  0.04bpp)  (20,000  (b)  SPIHT  (400x400  @  0.004bpp)  (2,000 

Bytes)  Bytes) 


(c)  SPIHT  (400x400  @  0.0004bpp)  (200  (d)  Synthesized  (116  Bytes) 

Bytes) 


Table  3.1:  Coral  reef  imagery,  encoded  using  the  full  spectrum  of  techniques  de¬ 
scribed  by  this  thesis.  Note  that  calculating  the  percentage  of  coral  cover  from 
the  200  Byte  SPIHT-encoded  image  would  be  impossible,  yet  is  trivial  for  the  syn¬ 
thesized  image.  Communicating  the  same  statistic  as  scalar  data  allows  the  entire 
dataset  to  be  represented  in  the  same  number  of  bytes.  Each  of  the  time  series'  has 
been  filtered  with  an  ll-point  median  filter  for  clarity. 
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(a)  Source  Image 


(b)  Segmented  +  Classified  Mask 


(e)  S5mthesized  Image,  with  Cuts 


(f)  Final  S5mthesized  Image 


Table  3.2:  The  original  image  is  shown  in  (a).  The  image  following  segmentation 
and  classification  is  shown  in  (b).  Part  (c)  next  displays  the  subsampled  image, 
(d)  represents  the  relative  sizes  of  the  data  as  encoded  naively  (8  bits  per  pixel), 
using  quantization  alone  (2  bits  per  pixel),  and  using  arithmetic  coding  (0.07  bits 
per  pixel).  The  reconstructed  image  is  shown  in  (e)  and  (f),  with  and  without  infer¬ 
tile  boundaries. 
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(a)  Source  Image 


(b)  Classification  Results 


(c)  Source  Image 


(d)  Classification  Results 


(e)  Source  Image  (f)  Classification  Results 

Table  3.3:  Segmentation  and  classification  results,  computed  using  hand-tuned  lo¬ 
cal  image  statistics. 
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Y'CbCr  colorspace,  separating  luminance  and  chrominance  data.  For  each  result¬ 
ing  chaimel,  the  local  mean  and  variance  are  computed  for  a  5  x  5  neighborhood  at 
each  pixel.  A  sequence  of  binary  thresholding  operations  over  these  statistics,  and 
comparisons,  form  the  basis  for  the  classifications.  After  each  class  is  identified,  it 
is  smoothed  with  a  morphological  Alternating  Sequence  Filter  (ASF)  [25]  to  smooth 
the  classification.  This  smoothing  is  done  with  a  circular  structural  element  as  de¬ 
scribed  in  Kaeli  et  al[56].  Classification  was  performed  sequentially  on  a  per-class 
basis,  using  manually  tuned  thresholds.  While  the  thresholds  were  manually  se¬ 
lected,  the  final  classification  procedure  performed  well  on  each  of  the  919  images 
in  the  dataset. 

The  resulting  image  masks  for  three  distinct  images  are  shown  in  Fig.  3.3.  After 
a  classification  mask  has  been  produced  for  each  image,  the  mask  is  subsampled. 
This  subsampling  is  not  performed  through  simple  decimation  or  interpolation, 
but  by  computing  the  dominant  class  within  a  grid  of  fixed-size  windows.  Grid 
cells  overlap  adjacenf  grid  cells  by  a  constant  amount,  as  seen  in  Fig.  3-15. 
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Figure  3-15:  A  schematic  of  the  grid  used  for  subsampling  the  classification  mask, 
illustrating  the  overlap  between  adjacent  cells.  Note  that  the  lack  of  overlap  af  im¬ 
age  edges  means  fhat  a  simple  interpolation  would  bias  the  resulting  mask. 

This  procedure  accounts  for  overlap  befween  adjacenf  fexfure  patches  that  will 
occur  during  the  final  phase  of  image  synthesis.  As  a  result  of  fhis  overlap,  sub- 
sampling  a  504  X  504  pixel  image  with  a  24  x  24  pixel  window  that  overlaps  by 
four  pixels  yields  a  square  mask  25  pixels  per  side,  rather  than  ^  =  21  pixels  on  a 
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side  as  might  be  expected.  Within  each  window,  the  most  common  identified  class 
(statistical  mode)  is  taken  as  the  subsampled  tile  class.  For  the  imagery  presented 
here,  24  x  24  pixel  grid  cells  were  used,  overlapping  by  4  pixels. 

3.3.3  Arithmetic  Coding  of  Texture  Masks 

At  this  point,  a  low-resolution  mask  of  fexfure  classes  has  been  generated.  The 
simplest  approach  would  be  to  represent  each  texture  class  with  an  integer,  and  se¬ 
quentially  encode  the  mask  pixels.  For  the  four-class  images  shown  here,  each  mask 
pixel  could  be  represenfed  in  only  two  bits,  resulting  in  a  size  of  ^25x2^^  =  157 
byfes  per  mask  for  the  parameters  used  here.  Compressing  an  image  with  dimen¬ 
sions  {hi,  Wi),  using  patches  of  dimension  {hp,  Wp)  fhaf  overlap  by  o  pixels  in  each  di¬ 
mension,  each  confaining  one  of  c  fexfure  classes,  would  require  [ log2  c 
bits.  Each  texture  class  is  not  uniformly  probable,  however,  and  the  pixels  are  not 
statistically  independent  of  each  ofher.  Given  these  facts,  the  use  of  enfropy  coding 
offers  fhe  abilify  fo  shrink  this  image  representation  even  further. 

Arithmetic  coding  allows  for  fhe  optimal  encoding  of  a  sequence  of  symbols 
when  the  probability  distribution  of  those  symbols  is  known  to  both  the  encoder 
and  decoder.  Adaptive  arithmetic  coding[129]  provides  for  near-optimal  encod¬ 
ing  by  learning  the  probability  distribution  as  the  sequence  is  encoded  or  decoded. 
The  most  simple  adaptive  coders  simply  generate  a  frequency  table  of  coded  sym¬ 
bols,  and  use  fhat  as  fhe  probabilify  disfribufion  for  fhe  next  symbol.  More  elab¬ 
orate  probabilistic  models  can  be  built  by  tracking  the  conditional  frequencies  of 
symbols  based  on,  for  instance,  the  previously  encoded  symbol.  In  the  sequence 
AAAABBBBAAAAAAAA,  for  insfance,  B  is  nof  a  particularly  likely  symbol.  If  we 
track  conditional  probabilities,  we  see  that  B  has  a  much  higher  likelihood  of  ap¬ 
pear  immediafely  affer  anofher  B,  and  can  build  a  beffer  probabilistic  model  of  our 
sequences. 

The  choice  of  an  appropriafe  adaptive  model  can  have  significant  impact  on 
the  efficiency  of  fhe  arifhmetic  coding.  Table  3.4  shows  fhe  resulfs  of  encoding  a 
sequence  of  919  fexfure  masks  using  an  adaptive  arifhmetic  coder,  wifh  each  of 
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nii-w 

nii-2 

IHi-l 

nii 

Context 

Mean 

(Bytes) 

Median 

(Bytes) 

Std.  Dev. 

(Bytes) 

rrii-i 

49.60 

51.0 

14.65 

49.44 

51.0 

14.62 

40.98 

42.0 

11.89 

Table  3.4:  Comparison  of  Arithmetic  Coding  Models.  Table  shows  average  size  in 
bytes  for  transmitting  a  single  image  using  the  specified  adaptive  model. 


three  different  adaptive  models.  The  first  model  conditions  probabilities  upon  the 
texture  class  of  the  previous  pixel.  The  second  model  conditions  the  probabilities 
upon  both  of  the  previous  two  pixels.  The  third,  and  most  effective  model,  condi¬ 
tions  the  probabilities  on  the  previous  pixel  and  the  pixel  directly  above  the  current 
pixel.  Note  that  it  is  not  possible  to  condition  on  future  pixels,  as  the  decoder  will 
not  have  access  to  those  pixels  until  after  decoding  the  current  one.  As  seen  in  Ta¬ 
ble  3.4,  conditioning  probabilities  upon  the  pixel  directly  above  the  current  one  in 
addition  to  the  previous  pixel  results  in  a  significant  improvement  in  encoding  ef¬ 
ficiency.  The  encoded  size  of  each  mask  is  plotted  in  Fig.  3-16  for  the  first  and  third 
adaptive  models. 

Source  images  with  large  regions  of  constant  textures  will  have  less  entropy  in 
the  texture  mask.  This  can  significantly  increase  the  efficiency  of  the  arithmetic 
coder,  resulting  in  a  smaller  compressed  representation  of  the  image.  As  one  ex¬ 
ample,  Fig.  3.5  consists  of  almost  entirely  two  texture  classes,  rubble  and  sand, 
arranged  as  two  large  regions.  The  texture  mask,  before  and  after  downsampling 
to  25x25,  are  also  shown  in  Fig.  3.5.  After  arithmetic  coding,  the  texture  mask  con¬ 
sumes  only  27  bytes.  The  average  size  for  this  dataset,  in  contrast,  was  41  bytes. 
The  amount  of  entropy  of  the  textures  in  the  compressed  imagery  is  easily  com¬ 
puted  from  the  transmitted  imagery,  and  could  be  used  subsea  for  image  selection, 
computed  on  the  surface  for  received  images,  or  transmitted  as  a  separate  scalar 
statistic. 

Rather  than  encode  these  texture  masks  in  the  order  that  the  source  images 
were  captured,  I  encode  the  masks  in  a  deterministic  marmer  that  provides  samples 
from  across  the  entire  sequence,  gradually  filling  in  the  entire  time-series  of  images 
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Figure  3-16:  The  number  of  bytes  required  to  transmit  each  image  using  arithmetic 
coding  is  shown  for  two  different  adaptive  models.  By  utilizing  both  the  previous 
neighbor  and  the  neighbor  above  each  new  pixel,  the  data  may  be  compressed  fur¬ 
ther  (blue)  than  when  only  utilizing  the  previous  neighbor  (green).  Note  the  much 
higher  performance  of  the  model  incorporating  the  vertical  neighbor  when  coding 
a  sequence  of  images  near  the  end,  each  of  which  is  entirely  sand.  Both  signals  are 
filtered  with  an  11-point  median  filter  for  clarity. 

rather  than  encoding  each  image  in  turn.  This  allows  surface  operators  to  quickly 
get  a  rough  idea  of  an  entire  dataset,  rather  than  a  clear  view  of  the  begiiming  of 
a  dataset.  Specifically,  for  a  sequence  of  n  images,  I  iterate  through  the  sequence 
with  a  step  size  of  beginning  with  the  first  image.  When  I  reach  the  end  of 
the  sequence,  I  reduce  the  step  size  by  half,  and  continue  from  the  beginning  of  the 
sequence,  skipping  images  which  have  already  been  encoded. 

3.3.4  Image  Synthesis 

After  receipt  of  a  texture  mask,  a  new  image  is  synthesized  by  the  recipient  to  match 
the  form  of  the  mask.  In  Efros  and  Freeman[30]  a  method  for  synthesizing  textures, 
nicknamed  "Image  Quilting",  is  presented.  The  synthesized  textures  are  generated 
from  patches  of  a  source  texture,  selected  to  meet  some  minimum  error  criterion. 
Much  of  the  seafloor  consists  of  large  swaths  of  repetitive  textures  -  sand  and  rub¬ 
ble  on  a  macro  scale,  and  the  fine-grained  textures  of  coral  and  on  a  microscale. 
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(a)  Source  image  exhibit-  (b)  Texture  mask  for  source  (c)  Downsampled  texture 

ing  large  constant-texture  image  mask 

regions 


Table  3.5:  Constant-texture  images,  like  this  one,  have  less  entropy  in  the  texture 
mask.  This  results  in  greater  coding  efficiency  during  the  arithmetic  coding  stage. 

This  texture  synthesis  approach  can  be  used  effectively  on  individual  textures  to 
generate  high  resolution  samples  from  low-resolution  samples,  as  in  Fig.  3.6,  3.7, 
and  3.8. 

To  synthesize  an  image  which  is  compatible  with  the  received  texture  mask, 
then,  each  patch  must  be  drawn  from  a  source  texture  corresponding  to  the  class 
of  that  pixel  in  the  mask.  As  each  texture  mask  is  decoded  by  the  recipient,  it  is 
used  to  synthesize  an  image  with  high-resolution  textures,  closely  approximating 
the  original  image.  The  quilting  process,  adapted  from  Efros  and  Freeman[30],  is 
completed  as  follows: 

1.  For  a  received  texture  mask  with  dimensions  h  x  w,  a  patch  size  of  t,  and 
overlap  o,  initialize  an  empty  destination  image  with  dimensions  {h-  {t  —  o)  + 

o,w  ■  {t  —  o)  +  o). 

2.  Go  through  the  received  texture  mask  in  raster  scan  order.  For  each  pixel, 
search  the  source  texture  corresponding  to  that  pixel's  class  for  a  set  of  patches 
that  match  the  patches  it  overlaps  within  some  error  tolerance.  Randomly 
pick  one  of  these  patches. 

3.  Compute  the  error  surface  between  the  newly  chosen  patch,  and  the  patches 
it  overlaps.  Find  the  minimum  cost  path  along  this  surface  and  make  that  the 
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(a)  Montastraea  Annularis 
(192  X  192) 


(b)  S5mthesized  Montas¬ 
traea  Annularis  texture 


(c)  S5mthesized  Montas¬ 
traea  Annularis  texture 
with  Minimum-Error  Cut 
boundaries 


(d)  Montastraea  Annularis 
(256  X  256) 


(e)  Synthesized  Montas¬ 
traea  Annularis  texture 


(f)  Synthesized  Montas¬ 
traea  Armularis  texture 
with  Minimum-Error  Cut 
boundaries 


(g)  Montastraea  Annularis 
(256  X  256) 


(h)  S5mthesized  Montas¬ 
traea  Armularis  texture 


(i)  S5mthesized  Montas¬ 
traea  Annularis  texture 
with  Minimum-Error  Cut 
boundaries 


Table  3.6:  Synthesis  Results  for  Montastraea  Annularis  Textures.  Synthesized  tex¬ 
ture  is  (504  X  504)  pixels.  Textures  are  synthesized  from  24  x  24  pixel  patches. 
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(a)  Rubble  (192  x  192) 


(b)  S5mthesized  Rubble  tex¬ 
ture 


(c)  S5mthesized  Rubble  tex¬ 
ture  with  Minimum-Error 
Cut  boundaries 


(d)  Rubble  (256  x  256) 


(e)  S5mthesized  Rubble  tex¬ 
ture 


(f)  Synthesized  Rubble  tex¬ 
ture  with  Minimum-Error 
Cut  boundaries 


(g)  S5mthesized  Combined 
Rubble  texture,  (24  x  24) 
patches,  (504  x  504) 


(h)  S5mthesized  Com¬ 
bined  Rubble  texture 
with  Minimum-Error  Cut 
boundaries 


Table  3.7:  Synthesis  Results  for  Rubble  Textures.  Synthesized  texture  is  (504  x  504) 
pixels.  Textures  are  synthesized  from  24  x  24  pixel  patches.  The  third  row  repre¬ 
sents  synthesis  from  a  combination  of  both  rubble  textures  (a)  and  (d). 
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(a)  Gorgonian  (192  x  192) 


(d)  Brain  Coral  (256  x  256) 


(g)  Sand  (256  x  256) 


(b)  S5mthesized  Gorgonian 
texture 


(e)  S5mthesized  Brain  Coral 
texture 


(h)  S5mthesized  Sand  tex¬ 
ture 


(c)  S5mthesized  Gorgonian 
texture  with  Minimum- 
Error  Cut  boundaries 


(f)  S5mthesized  Brain  Coral 
texture  with  Minimum- 
Error  Cut  boundaries 


(i)  S5mthesized  Sand  tex¬ 
ture  with  Minimum-Error 
Cut  boundaries 


Table  3.8:  Synthesis  Results  for  Miscellaneous  Textures.  Synthesized  texture  is 
(504  X  504)  pixels.  Textures  are  synthesized  from  24  x  24  pixel  patches,  with  four 
pixels  of  overlap. 
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boundary  of  the  new  patch.  Paste  the  patch  onto  the  destination  image. 

For  both  the  second  and  third  stages  of  this  algorithm,  the  error  between  patches 
was  computed  as  the  in  a  scaled  Y'CbCr  space,  where  the  Y  channel  error  is 
scaled  by  1.3,  and  each  chrominance  channel  error  is  scaled  by  0.85,  more  heavily 
weighting  luminance  errors  over  chrominance  errors.  Three  images  generated  us¬ 
ing  this  full  technique,  including  the  minimum  cost  paths  between  adjacent  patches, 
are  shown  in  Fig.  3.9.  Note  that  since  the  current  encoding  does  not  transmit  any 
color  information  explicitly,  there  may  be  a  mismatch  in  the  color  or  appearance 
of  the  synthesized  image  relative  to  the  first  image.  The  nearness  of  the  match 
between  the  synthesized  image  and  the  original  image  depends  upon  how  broad 
the  set  of  textures  are  within  one  texture  class.  Not  transmitting  color  information 
keeps  the  compressed  size  down  as  far  as  possible  (.0004  bits  per  pixel  per  color 
channel  for  this  dataset,  on  average).  Were  color  information  sent  up  as  well,  it 
could  be  used  as  a  constraint  during  the  texture  synthesis  phase  as  described  in 
Efros  and  Freeman's  original  paper [30].  In  this  way,  the  color  information  of  a  low 
resolution  SPIHT  image  could  be  combined  with  the  texture  information  of  a  syn¬ 
thesized  image,  for  example. 

3.4  Discussion 

In  this  chapter  I  have  laid  out  several  options  for  compressing  telemetry  from  AUVs, 
including  methods  applicable  to  both  environmental  data  and  imagery.  For  im¬ 
agery,  I  have  illustrated  a  range  of  encoding  options:  treating  summary  statistics 
as  time-series,  communicating  texture  information  via  Image  Synthesis,  or  trans¬ 
mitting  full  images  with  embedded  wavelet  compression.  While  selection  of  the 
appropriate  technique  from  these  options  must  be  done  with  an  understanding  of 
the  problem  domain,  the  techniques  are  complimentary.  For  typical  AUV  missions, 
a  combination  of  all  the  approaches  described  in  this  thesis  may  be  appropriate. 
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Final  Synthesized  Mask  Source 


Table  3.9:  Three  full  504  X  504  pixel  images  synthesized  using  this  approach  from 
24  X  24  pixel  patches,  with  four  pixels  of  overlap  between  adjacent  patches. 
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CHAPTER  4 


CAPTURE  Architecture 


Having  outlined  the  approach  to  compression  and  relay  communication  underly¬ 
ing  this  work,  I  describe  here  how  to  integrate  those  two  components  into  a  full  end- 
to-end  communication  architecture  for  relaying  acquired  data  from  an  AUV  across 
an  acoustic  network  of  marine  vehicles.  This  architecture  is  nicknamed  CAPTURE — 
a  Communications  Architecture  using  Progressive  Transmission  via  Underwater 
Relays  and  Eavesdroppers.  CAPTURE  relies  on  progressive  transmission  to  com¬ 
municate  data  as  a  sequence  of  gradually  improving  "previews".  High-quality  ver¬ 
sions  of  these  previews,  up  to  an  error-free  reconstruction,  can  be  requested  by  op¬ 
erators  immediately,  or  at  any  later  time  over  the  course  of  a  mission.  CAPTURE 
has  been  designed  to  facilitate  efficient  multi-hop  relay  communication  across  a 
small  group  of  vehicles,  where  the  vehicles  involved  may  be  from  different  manu¬ 
facturers,  or  have  different  software  architectures. 

4.1  Overview 

CAPTURE  consists  of  four  distinct  components,  shown  in  Fig.  4-1.  First,  a  set  of 
data  is  acquired  by  the  AUV  and  registered  as  a  transmittable  resource  with  the 
telemetry  system,  via  a  platform-specific  driver.  Examples  of  possible  resources 
include  a  single  image,  or  a  time-series  of  measurements  from  a  single  sensor.  The 
platform-specific  drivers  isolate  the  telemetry  system  from  the  specific  capabilities 
or  limitations  of  each  host  vehicle.  Second,  new  resources  are  automatically  se¬ 
lected  for  compression  and  transmission  to  the  surface,  or  existing  resources  are 
selected  for  further  transmission  based  on  requests  from  the  surface.  Automatic 
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Figure  4-1:  High-level  overview  of  data  flow  through  the  four  main  components 
of  CAPTURE.  Platform  drivers  connect  acquired  data  into  the  CAPTURE  system, 
which  is  then  winnowed  down,  compressed,  and  eventually  transmitted  to  the  sur¬ 
face. 


selection  provides  an  avenue  for  high-level  algorithms,  such  as  mine  identification 
or  other  interest  operators,  to  guide  the  selection  of  interesting  telemetry.  Third, 
selected  resources  are  compressed  using  progressive  coding  methods.  Progres¬ 
sive  coding  methods,  specifically  those  that  are  fully  embedded,  ensure  that  an 
approximation  to  the  data  can  be  reconstructed  with  each  newly  received  bit  of 
data.  Finally,  the  transmission  of  the  resource  to  the  surface  is  managed  to  ensure 
end-to-end  delivery.  When  multiple  underwater  vehicles  are  available,  interme¬ 
diate  vehicles  can  relay  data  to  the  surface  as  hops  in  the  route,  or  help  through 
'eavesdropping'.  The  flow  of  data  between  the  four  subsystems  is  shown  in  detail 
in  Fig.  4-2,  and  each  subsystem  is  described  in  detail  in  the  following  subsections. 
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Figure  4-2:  A  detailed  view  of  the  interactions  between  CAPTURE  components, 
particularly  the  role  of  the  platform  drivers.  Blue  indicates  the  path  of  a  resource 
(such  as  an  image)  through  CAPTURE;  red  and  green  indicate  vehicle  state  and 
control  messages  bypassing  the  majority  of  CAPTURE  components. 


4.2  Platform  Drivers 

The  platform  drivers  provide  an  interface  to  the  existing  software  on  each  different 
vehicle  platform.  Software  architectures  vary  significantly  from  vehicle  to  vehicle, 
as  do  sensing  and  computation  capabilities.  Platform  drivers  smooth  over  these 
architectural  differences  by  providing: 

•  An  interface  for  dafa  transmission  and  reception  via  the  modem, 

•  Configuration  of  resource  registration  and  prioritization, 

•  Handling  of  non-CAPTURE  acoustic  traffic,  such  as  command  and  control 
messages,  and 

•  Logging  support  via  LCM  [50]. 

Physical  cormections  to  the  vehicle's  acoustic  modem  vary,  but  most  are  con¬ 
nected  to  an  RS-232  serial  port.  Each  modem  manufacturer  provides  a  different 
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software  interface  to  the  end  user — there  does  not  yet  exist  a  common  API  shared 
by  multiple  manufacturers  such  as  the  Hayes  /  AT  Command  Set  that  dictated  the 
course  of  terrestrial  telephone  modems.  Webster  et  al.  previously  developed  a  mo¬ 
dem  abstraction  layer  for  the  WHOI  MicroModem[127].  More  recently,  the  Goby 
Autonomy  Project[100]  has  made  advances  in  developing  a  generic  abstraction  for 
acoustic  modems  and  implementing  drivers  for  physical  modem  hardware.  These 
drivers  allow  software  to  operate  independent  of  the  modems'  underlying  propri¬ 
etary  languages.  Goby  was  used  in  the  field  experiments  to  provide  a  low-level 
vendor-neutral  interface  to  the  acoustic  modem. 

AUVs  will  require  some  configuration,  such  as  information  about  any  acoustic 
range-based  navigation  systems  that  are  in  use,  or  the  specification  of  a  fixed  MAG 
communication  cycle.  Each  modem  requires  a  unique  integer  identifier,  typically 
specified  as  part  of  the  configuration.  That  configuration  is  performed  through 
the  platform  driver.  The  platform  driver  is  also  responsible  tor  registering  existing 
sensors,  such  as  cameras,  sonars  and  GTDs,  as  resource  generators.  The  impor¬ 
tance  of  different  resources  will  vary  by  mission  and  vehicle,  so  their  prioritization 
may  require  pre-mission  configuration  by  users.  Some  vehicles  may  only  register  a 
single  camera  and  transmit  imagery.  Other  vehicles  may  switch  between  multiple 
sensors,  such  as  a  camera  and  a  GTD,  selecting  between  the  resources  during  the 
prioritization  phase.  Gommand  and  control  messages,  such  as  vehicle  aborts  or 
mission  changes,  are  also  delivered  by  the  driver  to  appropriate  handlers. 

4.3  Resource  Prioritization 

Modern  AUV  platforms  generate  orders  of  magnitude  more  data  than  could  pos¬ 
sibly  be  transmitted  to  the  surface — the  first  task  facing  any  telemetry  system  is  to 
prioritize  which  data  should  be  transmitted.  At  any  given  time,  surface  operators 
can  choose  whether  to  request  refinement  of  a  specific  resource  or  whether  to  al¬ 
low  the  vehicle  to  automatically  select  new  resources  for  transmission.  For  vehicles 
with  multiple  sensors  of  interest,  it  is  also  necessary  to  multiplex  the  transmissions 
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between  those  sensors.  These  steps  can  be  quite  simple,  such  as  always  sending 
the  most  recent  resource  registered  by  a  single  sensor.  More  complex  missions 
may  involve  significant  computation  in  this  step,  such  as  identifying  seafloor  mines 
through  image  analysis.  Yogesh  Girdhar  has  published  a  sequence  of  papers[39, 40, 
41,  42]  on  identifying  a  subset  of  images  that  appropriately  describe  an  entire  col¬ 
lection  -  similar  to  the  problem  of  selecting  an  appropriate  subset  for  a  slideshow. 
These  papers  include  both  offline  methods,  useful  for  identifying  a  summary  sub¬ 
set  after  a  dive,  and  online  methods.  The  online  identification  method  (nicknamed 
ONSUM)  builds  the  summary  set  as  new  images  are  being  acquired,  making  it  ide¬ 
ally  suited  to  telemetry  prioritization.  These  methods  have  been  tested  with  several 
datasets,  including  one  from  a  small  shallow-water  AUV.  Thompson  et  al[118]  also 
have  earlier  work  on  optimal  prioritization  of  telemetry  for  the  Zoe  autonomous 
rover,  based  on  Hidden  Markov  Models. 

Multiplexing  of  multiple  sensors  on  a  single  vehicle  could  be  done  with  a  round 
robin  scheduling-based  approach,  priority  queues,  or  computed  metrics.  While  a 
single  image  is  easy  to  consider  as  a  distinct  'resource',  transmitting  environmental 
sensor  data  currently  requires  breaking  data  into  fixed-length  segments  for  trans¬ 
mission.  This  is  best  done  with  long  sequences  of  data  at  time  to  maximize  com¬ 
pression,  as  discussed  in  Sec.  3.2.2. 

4.4  Progressively  Encoded  Compression 

After  identifying  a  resource  for  transmission  to  the  surface,  that  resource  must  be 
compressed  to  maximize  the  throughput  of  the  chaimel.  CAPTURE  relies  on  pro¬ 
gressively  coded  compression  methods — preferably  fully  embedded  ones.  CAP¬ 
TURE  transmits  enough  data  to  the  surface  to  reconstruct  a  low-quality  "preview" 
of  each  automatically  selected  resource  before  moving  onto  a  new  resource.  Due 
to  the  progressive  nature  of  the  encoding,  each  new  piece  of  data  received  on  the 
surface  will  allow  an  increasingly  higher-quality  representation  of  the  resource  to 
be  reconstructed.  This  serves  two  equally  important  purposes.  If  the  "preview" 
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piques  the  operator's  interest,  the  operator  can  request  more  encoded  data  from 
that  resource  to  refine  the  already-transmitted  data  with  no  wasted  transmissions. 
Every  byte  sent  up  for  fhe  preview  will  be  used  as  the  basis  for  the  higher-quality 
version.  If,  on  the  other  hand,  the  resource  is  uninteresting,  the  operator  may  be 
able  to  determine  that  after  only  a  few  fransmissions  and  avoid  wasting  further 
bandwidth  to  deliver  a  full  preview. 

4.5  Multi-Hop  Networking 


Figure  4-3:  A  large  CAPTURE  nefwork,  including  multiple  hops  and  eavesdrop¬ 
pers.  The  vehicle  selecting  resources  for  transmission  is  known  as  the  'origin',  and 
the  surface  ship  is  the  'endpoint'. 

A  CAPTURE  network  consists  of  multiple  nodes,  including  an  origin,  endpoint, 
zero  or  more  ordered  vehicles,  and  possibly  some  eavesdroppers,  as  shown  in  Fig. 
4-3.  Resources,  such  as  photographs,  are  captured  by  the  origin  and  relayed  by 
hops  to  the  endpoint.  The  network  can  operate  in  either  an  automatic  selection 
mode  where  transmitted  resources  are  automatically  selected  by  the  origin,  or  in  a 
refinement  mode.  When  in  automatic  selection  mode,  enough  data  is  relayed  for 
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the  endpoint  to  reconstruct  a  low-quality  preview  of  a  resource.  When  the  origin 
learns  that  the  endpoint  has  received  enough  segments  to  reconstruct  a  preview, 
it  will  automatically  select  a  new  resource  for  transmission.  Since  the  origin  waits 
until  the  endpoint  has  confirmed  reception  before  moving  on  to  a  new  resource, 
more  data  may  be  transmitted  than  is  required  to  generate  a  preview.  The  end¬ 
point,  typically  a  maimed  surface  ship,  can  request  that  the  network  instead  oper¬ 
ate  in  refinement  mode.  In  refinement  mode,  data  continues  to  be  transmitted  for  a 
specific,  previously-transmitted  resource,  selected  by  the  endpoint.  The  origin  and 
hops  will  relay  additional  data  from  thaf  resource  until  the  network  is  put  back  into 
automatic  selection  mode  by  the  endpoint. 

4.6  Network  Protocol 

CAPTURE  uses  two  types  of  network  messages  to  communicate  information  be¬ 
tween  nodes:  Chunk  and  Control  messages.  The  bulk  of  traffic  in  a  CAPTURE 
network  consists  of  Chunk  messages. 

4.6.1  Chunk  Messages 

Even  after  compression,  resources  will  likely  be  too  large  for  transmission  by  to¬ 
day's  acoustic  modems,  and  thus  must  be  broken  into  segments.  Chunk  messages 
consist  of  a  single  segment  of  data,  along  with  the  identifier  for  the  segment's  posi¬ 
tion  within  the  resource,  and  a  unique  identifier  for  the  resource  itself.  Chunk  mes¬ 
sages  are  designed  to  stand  alone — any  vehicle  receiving  a  message  can  uniquely 
identify  the  resource  the  segment  belongs  to,  and  the  segment's  position  within  the 
resource,  without  any  additional  knowledge.  Segments  are  of  a  fixed  size,  which 
must  be  agreed  upon  within  the  network  before  deploymenf.  The  segment  size 
should  be  based  on  the  Maximum  Transmission  Unit  (MTU)  supported  by  the  mo¬ 
dem  hardware.  Eor  the  WHOI  MicroModem,  this  could  be  256,  512  or  2048  bits 
depending  on  the  level  of  error  correction  that  is  applied.  In  plain  English,  an  ex¬ 
ample  Chunk  message  could  be:  "The  4  segment  of  SeaBED's  33’'‘^  resource  consists 
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of  the  following  data  . .  Control  messages,  sometimes  referred  to  as  acknowledge¬ 
ments,  are  significantly  more  complicated. 

4.6.2  Control  Messages 

The  second  type  of  message  used  in  a  CAPTURE  network  is  a  Control  message. 
Control  messages  contain  a  variety  of  data  used  to  synchronize  the  state  of  the 
network  between  nodes  -  including  acknowledgement  and  routing  information. 
Control  messages  include  the  current  resource  identifier  being  transmitted  by  the 
network,  just  like  Chunk  messages,  but  otherwise  serve  a  different  purpose.  Each 
network  node  tracks  the  segments,  for  each  resource,  that  it  knows  each  other  node 
to  possess.  The  primary  purpose  of  control  messages  is  to  convey  partial  estimates 
of  these  'segment  masks'  between  CAPTURE  nodes,  acting  as  a  selective  acknowl¬ 
edgement.  In  particular,  the  message  indicates  the  highest  known  index  of  the 
endpoint's  contiguously  received  segments,  and  encodes  a  bitmask  indicating  the 
segments  beyond  that  which  are  known  to  be  possessed  by  network  hops  or  the 
endpoint.  Control  messages  also  identify  whether  the  network  is  operating  in  re¬ 
finement  mode  or  automatic  selection  mode.  One  possible  control  message  might 
be:  "The  route  consists  ofSeaBED,  vehicle  A,  vehicle  B,  and  the  endpoint.  SeaBED's  33^‘^ 
resource  is  being  refined  by  request  from  the  surface.  The  endpoint  has  received  the  first  9 
contiguous  segments.  Beyond  the  9^^  segment,  the  hops  and  endpoint  are  known  to  have 
received  the  following  segments:  ..." . 

Control  messages  also  include  the  current  route  from  the  origin  to  the  endpoint, 
and  a  revision  ID.  The  endpoint  can  alter  this  route  or  select  a  different  vehicle  as 
the  origin  by  incrementing  the  revision  ID.  The  route  consists  of  the  hardware  ID's, 
in  order,  for  the  nodes  currently  in  the  network:  (origin,  hop  a,  •  •  • ,  endpoint).  The 
overhead  of  this  routing  information  would  be  substantial  in  traditional  networks, 
but  adds  minimal  overhead  for  small  numbers  of  vehicles.  Eor  networks  with  small 
numbers  of  vehicles,  a  single  network  node  can  be  identified  by  a  few  bits,  and 
routes  can  be  expressed  in  a  byte  or  two. 


97 


4.6.3  Message  Handling 

Since  the  ocean  is  a  broadcast  medium,  messages  may  'skip'  any  individual  hop  in 
a  network,  or  even  be  communicated  directly  from  origin  to  endpoint.  There  is  no 
guarantee  or  requirement  that  each  message  be  communicated  along  every  node  in 
the  route.  When  any  message  is  received,  some  components  of  a  message  may  be 
ignored  depending  on  the  source  of  fhe  message.  In  particular,  some  dafa  is  nof  as¬ 
sumed  to  be  valid  unless  it  comes  from  upstream,  closer  fo  the  origin,  or  downstream, 
closer  to  the  endpoint.  For  example,  both  Chunk  and  Control  messages  contain  a 
resource  ID.  If  the  network  is  believed  to  be  in  automatic  selection  mode,  that  re¬ 
source  ID  is  taken  to  be  the  currently  active  resource  only  if  it  came  from  upstream. 
On  the  other  hand,  if  the  network  is  in  refinement  mode,  the  resource  ID  will  be 
taken  as  the  active  ID  only  if  if  came  from  downstream.  This  allows  the  origin  to 
control  the  transmission  of  automatically  selected  resources,  yet  also  propagates 
resources  requested  from  fhe  surface  fowards  the  origin  when  operating  in  refine¬ 
ment  mode. 

When  a  Chunk  message  is  received,  the  data  segment  it  contains  is  stored  at  the 
appropriate  offset  in  the  local  copy  of  the  resource.  The  receiving  node  also  stores 
that  the  transmitter  has  the  segment. 

Any  node  receiving  a  Control  message  first  incorporates  the  included  segment 
masks  into  their  own  segment  mask.  If  the  message  was  transmitted  by  the  im¬ 
mediate  downstream  neighbor,  the  current  autonomy  mode  is  also  stored  from  fhe 
message.  Finally,  if  the  route  revision  in  the  message  is  higher  than  that  of  the 
currently  stored  route,  the  local  copy  of  the  routing  information  is  updated. 

4.6.4  Transmission  Scheduling 

Which  messages  are  transmitted  by  a  network  node  depend  upon  the  node's  role 
in  the  CAPTURE  network,  as  shown  in  Table  4.1  below. 

When  transmitting  a  Chunk  message,  the  segment  masks  for  downsfream  nodes 
should  be  used  to  select  what  is  transmited.  Early  resource  segments  that  have  not 
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Chunk 

Control 

Endpoint 

X 

Hop 

X 

X 

Origin 

X 

Eavesdropper 

X 

X 

Table  4.1:  Message  types  transmitted  by  each  of  the  four  node  classs. 

been  received  by  any  nodes  closer  to  the  endpoint  are  the  highest  priority  In  partic¬ 
ular,  nodes  should  start  by  transmitting  the  earliest  segments  for  the  active  resource 
that  a  downstream  node  is  not  believed  to  possess,  and  continue  in-order  transmis¬ 
sion  of  any  later  segments  not  held  by  downstream  nodes.  When  a  Control  mes¬ 
sage  is  received  from  a  downstream  node,  this  process  starts  over  by  transmitting 
the  earliest  segment  now  known  to  not  be  received. 

Using  the  simulation  parameters  described  in  Section  2.3,  Fig.  4-4  illustrates 
the  sensitivity  of  a  CAPTURE  network  to  how  frequently  acknowlegement  mes¬ 
sages  are  transmitted  relative  to  segments  of  data.  When  the  ratio  is  high  and  ac¬ 
knowledgements  are  sent  infrequently,  the  odds  of  too  much  data  being  transmitted 
before  moving  on  are  high.  However,  transmission  rates  seems  to  be  relatively  in¬ 
sensitive  to  this  scheduling  for  values  near  the  minima  of  one  control  message  for 
every  four  to  eight  chunk  messages. 

4.6.5  Implementation 

Each  of  the  autonomous  platforms  had  a  platform  driver  developed  to  fit  the  needs 
of  their  specific  software  environments.  A  number  of  revisions  to  Goby[100]  were 
made  as  part  of  this  work,  which  allowed  it  to  be  used  as  a  software  abstraction  layer 
for  the  acoustic  modem  on  each  vehicle.  These  revisions  have  now  been  incorpo¬ 
rated  into  Goby  v2.0.  The  implementation  of  the  CAPTURE  network  protocol  re¬ 
lied  on  two  packed  message  types,  representing  the  Chunk  and  Control  messages. 
These  messages  were  constructed  as  512-bit  messages,  to  fit  the  requirements  of  the 
physical  layer.  The  specific  message  definitions  that  were  used  are  shown  in  Fig. 
4-5  and  Figure  4-6. 
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Figure  4-4:  Illustration  of  network  sensitivity  to  the  ratio  of  chunk  messages  fo  con- 
frol  messages.  Fig.  4-4a  shows  fhe  mean  fime  required  fo  receive  each  resource 
preview  for  a  fixed  preview  size  of  1600  byfes.  Fig.  4-4b  shows  the  final  average 
size  of  fhe  previews.  In  fofal,  the  link  was  12km  long,  and  consisted  of  five  evenly 
spaced  nodes  (four  hops). 
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Figure  4-5:  Definition  for  Chunk  messages  used  during  2011  field  experiments  in 
Buzzards  Bay.  The  numerical  scale  across  the  top  displays  the  number  of  bits.  Each 
subsequent  row  represents  additional  bits  which  continue  from  the  row  above. 
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Figure  4-6:  Definition  for  Control  messages  used  during  2011  field  experiments  in 
Buzzards  Bay.  The  numerical  scale  across  the  top  displays  the  number  of  bits.  Each 
subsequent  row  represents  additional  bits  which  continue  from  the  row  above. 

Since  the  entire  route  is  encoded  in  the  control  packet,  which  currently  is  twelve 
bits  long  (plus  three  to  allow  changing  the  route),  this  implementation  supports 
routes  containing  up  to  four  vehicles,  and  networks  containing  seven  vehicles  in 
total.  This  could  easily  be  expanded  for  longer  routes,  consuming  only  a  few  addi¬ 
tional  bits. 

The  Chunk  and  Control  messages  both  contain  a  time-of-launch  field,  allowing 
the  second  of  transmission  to  be  encoded  in  a  message.  All  vehicles  in  the  Buzzard's 
Bay  experiment  were  equipped  with  a  high  precision,  low-drift  clock  [33].  By  syn¬ 
chronizing  each  vehicle's  clock  at  the  surface,  all  nodes  can  passively  measure  the 
one-way-travel-time  (OWTT)  of  each  acoustic  broadcast  by  simply  comparing  the 
encoded  time-of-launch  and  the  observed  time-of-arrival.  Since  the  sound  speed 
profile  is  well  known  in  water,  the  inter- vehicle  range  can  be  easily  computed.  Over 
time,  vehicles  within  the  network  can  augment  each  other's  navigation  estimates 
using  these  additional  range  constraints. 
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CHAPTER  5 


Field  Results 


CAPTURE  has  been  field  tested  in  three  distinct  experiments,  and  four  different 
network  configurations,  as  shown  in  Fig.  5-1.  All  told,  these  experiments  involved 
six  distinct  autonomous  platforms,  including  two  different  SeaBED  AUVs,  two  dif¬ 
ferent  OceanServer  Iver  AUVs,  and  a  Bluefin  9  AUV.  In  addition,  four  maimed 
surface  ship  platforms  have  been  used,  involving  researchers  from  NOAA,  MIT, 
WHOI,  Northeastern  University,  University  of  Michigan,  and  Bluefin  Robotics  Cor¬ 
poration. 

In  February  of  2010,  an  early  version  of  the  CAPTURE  architecture  was  tested 
on  Lucille,  a  SeaBED-class[104]  AUV  owned  by  NOAA,  during  a  research  expedi¬ 
tion  aboard  the  NOAA  Ship  Oscar  Elton  Sette.  A  single  dive  was  performed  near 
Rota,  an  island  in  the  Northern  Marianas  Archipelago[72],  ranging  in  depth  be¬ 
tween  100  and  350  meters.  No  specific  constraints  were  put  on  the  surface  ship, 
which  remained  within  600  meters  of  the  vehicle  throughout  the  dive. 

In  late  May  of  2011,  CAPTURE  was  extended  to  operate  on  a  Bluefin  9  AUV 
equipped  with  a  "backseat  driver"  computation  stack  rurming  the  MOOS  software 
suite.  That  vehicle  is  part  of  ongoing  Mine  Counter-Measures  development,  seek¬ 
ing  to  identify  seafloor  mine-like  objects  and  transmit  their  sonar  signatures  to  the 
surface  for  confirmation[73]. 

In  August  of  2011,  CAPTURE  was  tested  on  three  autonomous  platforms  and 
one  manned  platform  operating  simultaneously.  Two  OceanServer  Iver  AUVs  with 
payload  and  navigation  suites  custom-developed  by  the  University  of  Michigan[32] 
provided  long-range  mid-water-column  survey  capability,  while  a  SeaBED  AUV 
provided  the  ability  to  capture  detailed  low-altitude  photographic  surveys.  A  pho- 
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(c)  Three  Hop  (d)  Route  Switch 


Figure  5-1:  Network  configurations  which  have  successfully  been  used  in  the  field 
with  CAPTURE.  In  the  fourth  example,  the  vehicle  responsible  for  initiating  trans¬ 
missions  was  changed  mid-dive,  in  response  to  a  request  from  the  surface. 
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Figure  5-2:  Vehicles  used  during  the  CAPTURE  '11  Experiment.  The  two  Iver  AUVs 
are  visible  center  and  right,  with  the  SeaBED  AUV  on  the  left. 

tograph  of  fhese  vehicles  is  shown  in  Fig.  5-2.  These  plafforms  were  coupled  with 
a  manned  surface  ship  -  the  R/V  Tioga,  and  a  number  of  dives  were  performed  in 
Buzzards  Bay,  Massachusetfs. 

5.1  Platform  Driver  /  Resource  Acquisition 

The  Lucille  AUV  used  during  the  2010  field  experimenf  is  equipped  with  a  five 
megapixel  Prosilica  color  camera,  featuring  a  CCD  with  high  dynamic  range.  Dur¬ 
ing  the  2010  field  experiment,  this  camera  captured  one  color  image  every  five  sec¬ 
onds  af  a  resolution  of  2048  x  2048  pixels.  Those  raw,  Bayer  RGGB  encoded,  images 
were  processed  and  converfed  to  the  Y'UV  colorspace  onboard  the  AUV's  main 
control  computer,  resulting  in  1024  x  1024  pixel  square  full  color  images. 

The  Bluefin  9  AUV  used  during  the  brief  mine  counfer-measure  experiment  is 
equipped  with  a  MarineSonic  sidescan  sonar  system,  which  generates  2D  imagery 
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Figure  5-3:  Bluefin  9  AUV  prior  to  deployment,  and  the  sonar  imagery  transmitted 
during  the  dive. 

in  a  proprietary  TIFF-like  format  after  a  fixed  number  of  scanlines.  A  platform 
driver  was  developed  to  support  reading  the  imagery  from  the  sonar,  and  to  in¬ 
terface  with  the  onboard  MOOS  autonomy  software.  Goby  software  was  used  to 
abstract  the  interface  with  the  on-board  WHOI  MicroModem.  During  a  very  short 
mission,  there  was  time  to  transmit  a  single  sonar  image  to  the  surface  from  the 
AUV,  shown  in  Fig.  5-3. 


5.2  Resource  Prioritization 

To  date,  our  field  experimenfs  have  relied  on  a  single-resource  queuing  model  fo 
identify  the  next  resource  for  transmission.  The  Lucille  AUV  used  during  the  2010 
field  experiment,  and  shown  in  Fig.  5-4,  has  a  single  onboard  CPU  used  for  both 
CAPTURE  and  vehicle  control.  To  minimize  the  risk  of  overloading  the  onboard 
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Figure  5-4:  Top:  Lucille,  a  SeaBED  AUV,  prior  to  launch  near  Rota,  2500km  south 
of  Tokyo.  Bottom:  Transmission  progress  overlaid  on  bathymetry. 
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CPU's  limited  resources,  the  most  recently  captured  photograph  was  compressed 
every  three  minutes.  This  led  to  several  images  being  compressed  but  not  trans¬ 
mitted,  but  ensured  that  new  data  was  always  available  for  transmission.  When 
CAPTURE  was  ready  to  transmit  a  new  resource  to  the  surface,  the  most  recently 
compressed  new  image  was  selected  for  transmission. 

5.3  Progressive  Encoding 

The  photographic  and  sonar  imagery  acquired  by  the  Lucille  and  Bluefin  AUV  re¬ 
spectively,  were  compressed  using  SPIHT  compression,  in  conjunction  with  the 
Cohen-Daubechies-Feauveau  9/7  wavelet[21].  The  MarineSonic  sonar  source  im¬ 
agery  was  a  grayscale  image  of  1024  x  960  pixels  in  a  proprietary  format.  For  the 
color  photographic  imagery  captured  by  the  Lucille  AUV,  50%  of  the  encoded  data 
stream  was  allocated  to  luminance  data,  and  50%  to  chrominance  data.  In  retro¬ 
spect,  allocating  a  higher  proportion  to  luminance  data  would  have  resulted  in 
more  visually  pleasing  imagery. 

During  the  2010  field  experiment,  a  total  of  fifteen  color  photographs  were  re¬ 
ceived  over  the  course  of  a  3.75  hour  period.  Of  the  fifteen  successfully  received 
images,  four  were  captured  during  descent  or  ascent  and  were  completely  black  as 
a  result.  The  eleven  non-black  images  received  are  shown  in  Fig.  5-5  and  5-6.  The 
fifteen  images  were  transmitted  over  a  3.75  hour  period,  resulting  in  about  fifteen 
minutes  per  image,  or  approximately  35  bits  per  second  achieved.  While  this  low 
number  is  largely  due  to  packet  loss  and  scheduling  in  real-world  conditions,  the 
modem  also  varied  the  level  of  forward  error  correction  it  applied,  between  en¬ 
codings  with  maximum  theoretical  burst  rates  of  520  and  5400  bifs  per  second,  fo 
obtain  richer  statistics  on  transmission  success. 

During  the  August  2011  CAPTURE  field  experiment,  extremely  murky  water 
conditions  prevented  capturing  photographs,  and  pre-captured  imagery  was  used 
instead.  In  addition,  one  test  was  performed  with  a  non-progressively  encoded 
dataset.  A  short  segment  of  audio,  Neil  Armstrong's  first  words  on  the  surface  of 
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Figure  5-5:  The  first  six  color  images  captured  by  the  SeaBED-class  AUV,  com¬ 
pressed  in-situ,  and  transmitted  to  surface  operators. 
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Figure  5-6:  The  final  five  non-black  images  returned  by  the  SeaBED  vehicle. 
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the  moon,  was  compressed  with  the  Speex  voice  codec  to  4368  bytes.  The  audio  was 
then  encrypted  using  AES  with  a  256  bit  key  Once  the  full  set  of  encoded  packets 
had  been  received,  the  audio  was  decoded  and  successfully  played. 


5.4  Relay  Communication 


(a)  41  Segments  (b)  Log.  difference  (c)  97  Segments 

Figure  5-7:  A  transmitted  grayscale  photo  prior  to,  and  after  requesting  additional 
refinement.  The  difference  in  magnitude  is  shown  between  the  two  versions  on  a 
logarithmic  scale  to  highlight  changes. 

Three  separate  successful  CAPTURE  dives  were  performed  during  the  most  re¬ 
cent  field  experiment,  each  testing  different  capabilities  of  the  networking  protocol. 
During  one  trial,  data  was  communicated  across  a  two-hop  network  as  shown  in 
Fig.  5-lb.  After  six  preview  images  were  sequentially  transmitted  as  2048  byte 
previews,  the  fourth  transmitted  image  was  identified  by  the  surface  operator  as 
warranting  further  refinement.  Upon  request,  the  transmitting  vehicle  went  back 
and  provided  additional  data  to  refine  the  image,  as  shown  in  Fig.  5-7.  The  origin 
eventually  transmitted  5529  contiguous  bytes  of  the  image  before  being  instructed 
to  return  to  automatic  selection. 

A  total  of  seven  images  were  eventually  transmitted,  each  decoded  progres¬ 
sively,  with  gradually  improving  reconstructions  over  the  course  of  the  transmis¬ 
sion.  CAPTURE  was  also  tested  in  the  three-hop  linear  network,  depicted  in  Fig. 
5-lc,  successfully  relaying  four  images  across  the  heterogeneous  network. 
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In  the  final  experiment,  a  two-hop  network  was  employed  as  shown  in  Figure 
5-ld.  Four  grayscale  images  were  transmitted  from  an  Iver  AUV,  via  a  SeaBED 
AUV,  to  the  surface.  The  surface  operator  then  requested  a  route  change,  grant¬ 
ing  the  other  Iver  AUV  the  responsibility  for  transmitting  resources.  That  vehicle 
transmitted  the  pre-loaded  encrypted  speech,  followed  by  another  two  grayscale 
photos. 
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CHAPTER  6 


Discussion 


Relaying  high-resolution  scientific  data  from  submerged  AUVs  over  long  horizon¬ 
tal  distances  faces  numerous  obstacles.  In  this  thesis  I  have  presented  an  analysis 
of  these  obstacles,  along  with  the  design  of  a  unified  solution  in  the  form  of  CAP¬ 
TURE.  A  networking  and  compression  infrastructure,  CAPTURE  supports  trans¬ 
mission  and  interactive  refinement  of  sonar  and  photographic  data,  along  with 
scalar  environmental  measurements. 


6.1  Contributions 

The  specific  contributions  and  characteristics  of  this  work  can  be  divided  into  those 
relating  to  compression  and  data  selection,  and  those  related  more  directly  to  the 
CAPTURE  networking  architecture.  To  demonstrate  the  viability  of  this  architec¬ 
ture,  I  presented  both  simulated  results  and  real-world  results  from  ruiming  CAP¬ 
TURE  software  in  three  field  experiments,  in  diverse  environments,  on  SeaBED, 
OceanServer  and  Bluefin  AUVs,  each  employing  significantly  different  software 
architectures. 

6.1.1  Compression 

While  wavelets  have  previously  been  recognized  as  appropriate  for  underwater  im¬ 
age  coding,  there  are  no  known  examples  of  using  wavelet-based  source  coding  for 
scalar  telemetry.  This  work  additionally  represents  the  first  application  of  fully  em¬ 
bedded  encodings  to  AUV  telemetry,  allowing  decoding  to  halt  after  any  number  of 
contiguous  packets  while  still  producing  the  highest  quality  reproduction  for  that 
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number  of  bits.  I  also  presented  a  novel  method  for  highlighting  data  regions  of 
interest  prior  to  compressing  data  with  these  wavelet  coders. 

To  fill  the  capability  gap  between  transmitting  individual  images  and  transmit¬ 
ting  summary  statistics,  this  thesis  presents  a  novel  compression  technique  based 
on  image  synthesis.  This  strategy  provides  a  very  compact  representation  for  im¬ 
agery,  utilizing  inter-image  redundancy  while  communicating  both  the  visual  'gist' 
of  an  image  in  texture  space  and  allowing  computation  of  texture  statistics  on  the 
surface. 

CAPTURE  takes  a  hybrid  approach  to  data  selection,  incorporating  both  au¬ 
tonomous  prioritization  and  feedback  from  human  operators.  There  are  clear  op¬ 
portunities  to  incorporate  high-level  autonomy  algorithms  during  the  data  selec¬ 
tion  process.  New  resources  are  automatically  selected  for  compression  and  trans¬ 
mission  to  the  surface  barring  specific  requests  from  human  operators.  As  humans 
are  able  to  view  the  data  being  collected  by  an  AUV  in  real  time,  they  are  better 
enabled  to  recognize  anomolies  and  features  of  interest.  Human  feedback  is  ex¬ 
plicitly  incorporated  into  CAPTURE  by  allowing  the  identification  of  scientifically 
valuable  images  or  data  segments  for  additional  refinement,  and  allowing  refine¬ 
ment  up  to  an  arbitrarily  high-quality  reconstruction.  This  work  represents  the 
first  example  of  human-driven  data-quality  selection  for  scalar  AUV  telemetry  of 
underwater  AUVs. 

6.1.2  Networking 

The  ocean  imposes  challenges  on  underwater  networks  including  high  latency,  in¬ 
termittent  communication,  the  lack  of  instantaneous  end-to-end  connectivity,  and 
a  broadcast  medium.  This  thesis  uniquely  employs  a  strategy  of  comprehensive 
data  storage  at  every  node  and  a  broadcast-based  selective  acknowledgement  pro¬ 
tocol  to  combat  these  challenges.  Relying  on  a  "store  and  forward"  architecture, 
rather  than  a  simple  relay  chain,  increases  the  performance  of  the  relay  link  in  poor 
conditions. 

Most  underwater  networking  research  assumes  the  use  of  relatively  low-cost. 
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and  low-complexity,  static  communication  nodes.  These  nodes  may  be  augmented 
by  one  or  two  AUVs,  but  the  fixed  nodes  represent  the  backbone  of  the  network. 
Fixed  seafloor  nodes  cost  5-10  times  less  than  even  low-cost  AUVs,  yet  untethered 
AUV  are  the  most  practical  option  for  accessing  some  environments.  This  thesis 
presents  an  approach  targeted  to  very  small  relay  networks  of  AUVs,  using  the 
small  network  size  as  an  advantage. 

6.2  Future  Work  and  Limitations 

Geographic  routing 

Recently  there  has  been  significant  interest  in  geographically  aware  routing 
protocols [131, 135],  which  learn  and  construct  routing  tables  based  on  known 
node  locations.  To  that  end,  the  one-way-travel-time  capabilities  in  the  cur¬ 
rent  implementation  of  CAPTURE  allow  every  vehicle  to  determine  the  range 
from  any  overheard  transmitter  with  each  transmission.  That  information  is 
not  currently  used,  but  could  be  used  in  the  future  to  aid  in  routing. 

Incorporate  network  coding 

When  using  random  linear  network  codtng[64][18][63],  instead  of  a  single 
packet  being  transmitted,  a  linear  combination  of  several  packets  is  transmit¬ 
ted  along  with  the  linear  coefficients.  After  enough  of  these  random  packets 
have  been  received,  it  becomes  possible  to  decode  all  of  the  original  packets 
that  were  linearly  combined.  Used  strategically,  this  decreases  the  frequency 
of  ARQ  required. 

Facilitate  handling  of  multiple  data  sources 

While  there  are  no  architectural  barriers  to  transmitting  data  from  multiple 
data  sources,  the  current  implementation  does  not  provide  any  way  to  multi¬ 
plex  or  differentiate  between  multiple  sensors.  There  are  clear  opportunities 
for  both  automatic  and  user-guide  approaches  to  selection  of  a  data  source  - 
as  well  as  opportunities  to  incorporate  more  advanced  data  interest  detectors 
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on  a  per-sensor  basis,  such  as  image  recognition  algorithms. 

Preview  size  selection 

In  the  current  implementation  of  CAPTURE,  the  size  of  the  preview  trans¬ 
mitted  before  the  origin  moves  to  a  new  resource  is  defined  by  the  resource 
type.  Incorporating  automatic  selection  of  the  preview  size  would  allow  the 
origin  to  transmit  higher  quality  previews  for  images  fhat  if  believes  have  a 
higher  likelihood  of  being  interesting.  Simple  image  statistics  such  as  entropy 
or  luminance  would  likely  be  sufficient  for  fhis  purpose. 

User-driven  Region  of  Interest  selection 

SPIHT  does  not  include  any  explicit  information  in  the  bitstream  about  which 
wavelet  coefficients  are  being  encoded,  making  it  challenging  to  refine  only 
specific  regions  of  an  image.  Other  embedded  wavelet  compressors,  includ¬ 
ing  WDR[120],  do  support  this  capability  at  a  small  cost  to  image  quality. 

Artifact  Reduction  for  Image  Synthesis 

Transmitting  texture  information  as  a  low-resolution  grid  results  in  blocking 
artifacts  in  the  reconstructed  image.  Additionally,  texture  class  boundaries 
are  currently  handled  no  differently  than  constant-texture  areas  during  im¬ 
age  synthesis.  Explicit  handling  of  class  boundaries  could  improve  the  sec¬ 
ond  issue,  and  the  former  could  be  improved  by  denser  encoding  of  texture 
information.  There  is  also  a  body  of  literature  on  blocking  artifact  reduction 
in  Vector  Quantization  that  may  be  relevant. 

Progressive  Synthesis  Mask  Transmission 

At  an  average  of  40  byfes,  individual  fexfure  masks  are  easy  fo  fransmif  in 
a  single  transmission.  Eor  higher  resolution  masks,  with  a  larger  number  of 
texfure  classes,  fhe  size  of  the  texture  mask  could  grow  significantly.  As  it  in¬ 
creases  in  size,  a  progressive  transmission  scheme  for  individual  texture  class 
masks  based  on  shape-adaptive  SPIHT  coding  techniques  (e.g.  [66])  could  be 
worth  investigating. 
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Synthesis/Wavelet  Hybrid  Image  Compression 

Image  Synthesis  is  effective  at  encoding  repetitive  background  textures  (like 
sand),  but  does  not  encode  objects,  or  unrecognized  texture  areas.  A  hybrid 
approach  could  be  developed  that  uses  Image  Synthesis  to  compress  the  back¬ 
ground  of  each  image,  and  embedded  wavelet  compression  to  encode  any 
'significant'  foreground  objects,  based  on  some  significance  metric. 


6.3  Modem  Suggestions 

Although  most  testing  and  development  took  place  on  the  WHOI  Micro-Modem, 
this  work  is  designed  to  be  independent  of  the  specific  physical  layer  used  for  com¬ 
munication.  The  modem  provides  as  robust  a  physical  layer  as  can  be  hoped  given 
the  underwater  environment,  yet  the  modem's  interface  to  higher-level  applica¬ 
tions  imposes  several  limitations.  Many  of  these  limitations  stem  from  the  age  of 
the  underlying  hardware,  which  is  in  the  process  of  a  major  revision.  Three  key 
changes  that  would  make  the  Micro-Modem  easier  to  use  with  AUVs  are: 

•  A  greater  independence  between  MTU  and  FEC 

•  The  ability  to  include  custom  header  metadata  in  packets 

•  A  "raw"  modulation  mode  and  "best-effort"  decoding 

The  Maximum  Transmission  Unit  of  the  WHOI  Micro-Modem  depends  upon 
the  level  of  FEC  selected  for  chaimel  coding.  Depending  on  the  specific  level  of  cod¬ 
ing  selected,  the  MTU  maybe  32, 64,  or  256  bytes.  This  variability  makes  it  challeng¬ 
ing  to  select  an  FEC  level  based  solely  on  the  current  chaimel  quality,  as  it  imposes 
constraints  on  the  higher  level  network  layers.  Indeed,  32  bytes  is  a  relatively  small 
MTU,  and  imposes  fragmentation  on  all  but  the  most  trivial  of  messages.  Even  with 
64  byte  frames,  a  few  bytes  of  header  metadata  stands  out  as  a  significant  cost.  In 
[7]  an  optimal  packet  size  is  derived  for  multi-hop  relay  networks,  based  upon  sim¬ 
ulated  results  with  two  realistic  MAC  protocols.  The  optimum  size  varies  with  the 
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specific  simulation  parameters,  but  is  generally  a  few  hundred  to  a  few  thousand 
bytes. 

Packet  transmissions  from  the  Micro-Modem  contain  a  heavily  protected  meta¬ 
data  header,  including  a  7-bit  source  ID  and  7-bit  destination  ID.  Even  when  the 
bulk  of  a  message  is  lost,  this  header  information  frequently  remains  decodable. 
The  ability  to  include  custom  heavily  protected  metadata  in  this  header  would 
allow  routing  information  and  other  critical  metadata  to  receive  protection,  even 
when  the  EEC  applied  to  the  rest  of  the  frame  is  low. 

When  a  packet  is  received  by  the  Micro-Modem  a  CRC,  or  "checksum",  is  com¬ 
puted.  If  the  computed  checksum  does  not  match  one  encoded  in  the  packet,  the 
packet  is  dropped  and  not  presented  to  the  user.  This  makes  it  challenging  to  apply 
error  correction  customized  to  your  dataset  (though  not  impossible,  as  described 
further  below).  If  the  modem  reported  a  best-estimate  decoding  in  the  case  of  failed 
CRC's,  possibly  along  with  hard  decisions  from  the  equalizer,  EEC  could  be  applied 
in  software  as  well  as  in  the  modem  firmware. 

As  a  direct  result  of  this  work,  each  of  these  interface  suggestions  is  being  ac¬ 
tively  considered  for  incorporation  into  the  next  version  of  the  WHOI  Micro-Modem, 
and  some  are  already  being  implemented. 
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